Patent search ap:("Lemon Inc.") AND inv:"Xuchen Song" Page 1

1.

发明授权
System and method for training a transformer-in-transformer-based neural network model for audio data 有权

公开(公告)号：US11854558B2

公开(公告)日：2023-12-26

申请号：US17502863

申请日：2021-10-15

Applicant: Lemon Inc.

Inventor： Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song

IPC: G10L19/02 , G10L25/30

CPC classification number: G10L19/02 , G10L25/30

Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

2.

发明申请
SYSTEM AND METHOD FOR TRAINING A TRANSFORMER-IN-TRANSFORMER-BASED NEURAL NETWORK MODEL FOR AUDIO DATA 有权

公开(公告)号：US20230124006A1

公开(公告)日：2023-04-20

申请号：US17502863

申请日：2021-10-15

Applicant: Lemon Inc.

Inventor： Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song

IPC: G10L19/02 , G10L25/30

Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

3.

发明申请
MULTI-MODAL ENCODER PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20250078814A1

公开(公告)日：2025-03-06

申请号：US18819280

申请日：2024-08-29

Applicant: Lemon Inc. , Beijing Zitiao Network Technology Co., Ltd.

Inventor： Dong Guo , Zihao He , Weituo Hao , Xuchen Song , Zongyu Yin , Jingsong Gao , Wei Tsung Lu , Junyu Dai

IPC: G10L15/06 , G06F40/126 , G10L25/30

Abstract: The present disclosure provides a multi-modal encoder processing method and apparatus, a computer device and a storage medium. The method includes: acquiring a pair of mask samples to be processed, the pair of mask samples including a text sample and an audio sample associated with each other, and at least one of the text sample and the audio sample is masked; based on a multi-modal encoder, generating a text encoding feature of the text sample, and generating an audio encoding feature of the audio sample, a linear spectrum feature of the audio sample being fused in the text encoding feature, and a linear word feature of the text sample being fused in the audio encoding feature; and predicting masked mask information according to the text encoding feature and the audio encoding feature, and correcting the multi-modal encoder based on an accuracy of the mask information.

4.

发明申请
PRODUCTION METHOD OF MULTIMEDIA WORK, APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM 有权

公开(公告)号：US20230131850A1

公开(公告)日：2023-04-27

申请号：US18069031

申请日：2022-12-20

Applicant: Lemon Inc.

Inventor： Xiaojuan CAI , Xuchen Song , Gen Li , Haoyuan Zhong , Weishu Mo , Hui Li

IPC: G06F16/483 , G06F16/683 , G06N3/02

Abstract: A production method and device for multimedia work, and a computer-readable storage medium. The method includes: acquiring a target audio and at least one piece of multimedia information, calculating a matching degree between the target audio and the multimedia information, sorting the at least one piece of multimedia information according to the matching degree in a descending order, assigning top-ranking multimedia information as target multimedia information; calculating the image quality of each image in the target multimedia information, sorting every image of the target multimedia information according to image quality in a descending order, assigning the top-ranking images as target images; and synthesizing a multimedia work according to the target images and the target audio. The method allows the acquisition of high-definition multimedia work in which the video content and background music match with each other, and reduces the time cost and learning cost consumed by users in editing videos.

Patent Agency Ranking