-
公开(公告)号:US11854558B2
公开(公告)日:2023-12-26
申请号:US17502863
申请日:2021-10-15
Applicant: Lemon Inc.
Inventor: Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song
Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.
-
公开(公告)号:US20230124006A1
公开(公告)日:2023-04-20
申请号:US17502863
申请日:2021-10-15
Applicant: Lemon Inc.
Inventor: Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song
Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.
-
公开(公告)号:US20250078814A1
公开(公告)日:2025-03-06
申请号:US18819280
申请日:2024-08-29
Applicant: Lemon Inc. , Beijing Zitiao Network Technology Co., Ltd.
Inventor: Dong Guo , Zihao He , Weituo Hao , Xuchen Song , Zongyu Yin , Jingsong Gao , Wei Tsung Lu , Junyu Dai
IPC: G10L15/06 , G06F40/126 , G10L25/30
Abstract: The present disclosure provides a multi-modal encoder processing method and apparatus, a computer device and a storage medium. The method includes: acquiring a pair of mask samples to be processed, the pair of mask samples including a text sample and an audio sample associated with each other, and at least one of the text sample and the audio sample is masked; based on a multi-modal encoder, generating a text encoding feature of the text sample, and generating an audio encoding feature of the audio sample, a linear spectrum feature of the audio sample being fused in the text encoding feature, and a linear word feature of the text sample being fused in the audio encoding feature; and predicting masked mask information according to the text encoding feature and the audio encoding feature, and correcting the multi-modal encoder based on an accuracy of the mask information.
-
公开(公告)号:US20230131850A1
公开(公告)日:2023-04-27
申请号:US18069031
申请日:2022-12-20
Applicant: Lemon Inc.
Inventor: Xiaojuan CAI , Xuchen Song , Gen Li , Haoyuan Zhong , Weishu Mo , Hui Li
IPC: G06F16/483 , G06F16/683 , G06N3/02
Abstract: A production method and device for multimedia work, and a computer-readable storage medium. The method includes: acquiring a target audio and at least one piece of multimedia information, calculating a matching degree between the target audio and the multimedia information, sorting the at least one piece of multimedia information according to the matching degree in a descending order, assigning top-ranking multimedia information as target multimedia information; calculating the image quality of each image in the target multimedia information, sorting every image of the target multimedia information according to image quality in a descending order, assigning the top-ranking images as target images; and synthesizing a multimedia work according to the target images and the target audio. The method allows the acquisition of high-definition multimedia work in which the video content and background music match with each other, and reduces the time cost and learning cost consumed by users in editing videos.
-
-
-