System and method for training a transformer-in-transformer-based neural network model for audio data

    公开(公告)号:US11854558B2

    公开(公告)日:2023-12-26

    申请号:US17502863

    申请日:2021-10-15

    Applicant: Lemon Inc.

    CPC classification number: G10L19/02 G10L25/30

    Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

    SYSTEM AND METHOD FOR TRAINING A TRANSFORMER-IN-TRANSFORMER-BASED NEURAL NETWORK MODEL FOR AUDIO DATA

    公开(公告)号:US20230124006A1

    公开(公告)日:2023-04-20

    申请号:US17502863

    申请日:2021-10-15

    Applicant: Lemon Inc.

    Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

    PRODUCTION METHOD OF MULTIMEDIA WORK, APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM

    公开(公告)号:US20230131850A1

    公开(公告)日:2023-04-27

    申请号:US18069031

    申请日:2022-12-20

    Applicant: Lemon Inc.

    Abstract: A production method and device for multimedia work, and a computer-readable storage medium. The method includes: acquiring a target audio and at least one piece of multimedia information, calculating a matching degree between the target audio and the multimedia information, sorting the at least one piece of multimedia information according to the matching degree in a descending order, assigning top-ranking multimedia information as target multimedia information; calculating the image quality of each image in the target multimedia information, sorting every image of the target multimedia information according to image quality in a descending order, assigning the top-ranking images as target images; and synthesizing a multimedia work according to the target images and the target audio. The method allows the acquisition of high-definition multimedia work in which the video content and background music match with each other, and reduces the time cost and learning cost consumed by users in editing videos.

Patent Agency Ranking