IMPLEMENTING IMPROVED AUDIO SOURCE SEPARATION

    公开(公告)号:US20250078857A1

    公开(公告)日:2025-03-06

    申请号:US18241138

    申请日:2023-08-31

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for implementing improved audio source separation. A complex spectrum X is split into a plurality of K bands along a frequency axis by applying band-split operations on the complex spectrum X. The complex spectrum is a time-frequency representation of audio signals. Each of the plurality of K bands is denoted as Xk, k=1, . . . , K. Each band Xk comprises one or more frequency bins. Each individual multilayer perceptron is applied to each band Xk to extract latent representations and obtain outputs Hk0. A time-domain transformer and a frequency-domain transformer are applied on a stacked representation H0. Time-domain and frequency-domain transformers are repeatedly applying in an interleaved manner for L times to obtain HL output from the transformer blocks. The HL is input into a multi-band mask estimation sub-model. A complex ideal ratio mask is generated based on outputs from the multi-band mask estimation sub-model.

    IMPLEMENTING AUTOMATIC MUSIC AUDIO TRANSCRIPTION

    公开(公告)号:US20240404494A1

    公开(公告)日:2024-12-05

    申请号:US18204855

    申请日:2023-06-01

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for implementing automatic music audio transcription. A deep neural network model may be configured. The deep neural network model comprises a spectral cross-attention sub-model configured to project a spectral representation of each time step t, denoted as St, into a set of latent arrays at the time step t, denoted as θth, h representing an h-th iteration. The deep neutral network model comprises a plurality of latent transformers configured to perform self-attention on the set of latent arrays θth. The deep neural network model further comprises a set of temporal transformers configured to enable communications between any pairs of latent arrays θthat different time steps. Training data may be augmented by randomly mixing a plurality of types of datasets comprising a vocal dataset and an instrument dataset. The deep neural network model may be trained using the augmented training data.

    MELODY EXTRACTION FROM POLYPHONIC SYMBOLIC MUSIC

    公开(公告)号:US20240153474A1

    公开(公告)日:2024-05-09

    申请号:US17986644

    申请日:2022-11-14

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for melody extraction. The techniques comprise receiving a polyphonic symbolic music file. The polyphonic symbolic music file may comprise a plurality of notes. The polyphonic symbolic music file may be converted to a plurality of feature vectors. Each of the plurality of feature vectors may be a multidimensional vector. Each of the plurality of feature vectors may correspond to a particular note of the plurality of notes. The plurality of feature vectors corresponding to the plurality of notes may be classified using a model that is trained to determine whether each of the plurality of notes belongs to a melody based on the plurality of feature vectors.

Patent Agency Ranking