Identifying music attributes based on audio data

    公开(公告)号:US12026198B2

    公开(公告)日:2024-07-02

    申请号:US17384576

    申请日:2021-07-23

    Applicant: LEMON INC.

    CPC classification number: G06F16/683 G06F16/65 G06N3/08 G10G1/00 G10H1/0025

    Abstract: The present disclosure describes techniques for identifying music attributes. The described techniques comprises receiving audio data of a piece of music; determining at least one attribute of the piece of music based on the audio data of the piece of music using a model; the model comprising a convolutional neural network and a transformer; the model being pre-trained using training data, wherein the training data comprise labelled data associated with a first plurality of music samples and unlabelled data associated with a second plurality of music samples, the labelled data comprise audio data of the first plurality of music samples and label information indicative of attributes of the first plurality of music samples, and the unlabelled data comprise audio data of the second plurality of music samples.

    System and method for training a transformer-in-transformer-based neural network model for audio data

    公开(公告)号:US11854558B2

    公开(公告)日:2023-12-26

    申请号:US17502863

    申请日:2021-10-15

    Applicant: Lemon Inc.

    CPC classification number: G10L19/02 G10L25/30

    Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

    SYSTEM AND METHOD FOR TRAINING A TRANSFORMER-IN-TRANSFORMER-BASED NEURAL NETWORK MODEL FOR AUDIO DATA

    公开(公告)号:US20230124006A1

    公开(公告)日:2023-04-20

    申请号:US17502863

    申请日:2021-10-15

    Applicant: Lemon Inc.

    Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

Patent Agency Ranking