IMPLEMENTING IMPROVED AUDIO SOURCE SEPARATION

    公开(公告)号:US20250078857A1

    公开(公告)日:2025-03-06

    申请号:US18241138

    申请日:2023-08-31

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for implementing improved audio source separation. A complex spectrum X is split into a plurality of K bands along a frequency axis by applying band-split operations on the complex spectrum X. The complex spectrum is a time-frequency representation of audio signals. Each of the plurality of K bands is denoted as Xk, k=1, . . . , K. Each band Xk comprises one or more frequency bins. Each individual multilayer perceptron is applied to each band Xk to extract latent representations and obtain outputs Hk0. A time-domain transformer and a frequency-domain transformer are applied on a stacked representation H0. Time-domain and frequency-domain transformers are repeatedly applying in an interleaved manner for L times to obtain HL output from the transformer blocks. The HL is input into a multi-band mask estimation sub-model. A complex ideal ratio mask is generated based on outputs from the multi-band mask estimation sub-model.

    IMPLEMENTING AUTOMATIC MUSIC AUDIO TRANSCRIPTION

    公开(公告)号:US20240404494A1

    公开(公告)日:2024-12-05

    申请号:US18204855

    申请日:2023-06-01

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for implementing automatic music audio transcription. A deep neural network model may be configured. The deep neural network model comprises a spectral cross-attention sub-model configured to project a spectral representation of each time step t, denoted as St, into a set of latent arrays at the time step t, denoted as θth, h representing an h-th iteration. The deep neutral network model comprises a plurality of latent transformers configured to perform self-attention on the set of latent arrays θth. The deep neural network model further comprises a set of temporal transformers configured to enable communications between any pairs of latent arrays θthat different time steps. Training data may be augmented by randomly mixing a plurality of types of datasets comprising a vocal dataset and an instrument dataset. The deep neural network model may be trained using the augmented training data.

    TRACKING BEATS AND DOWNBEATS OF VOICES IN REAL TIME

    公开(公告)号:US20240395231A1

    公开(公告)日:2024-11-28

    申请号:US18200924

    申请日:2023-05-23

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for tracking beats and downbeats of audio, such as human voices, in real time. Audio may be received in real time. The audio may be split into a sequence of segments. A sequence of audio features representing the sequence of segments of the audio may be extracted. A continuous sequence of activations indicative of probabilities of beats or downbeats occurring in the sequence of segments of the audio may be generated using a machine learning model with causal mechanisms. Timings of the beats or the downbeats occurring in the sequence of segments of the audio may be determined based on the continuous sequence of activations by fusing local rhythmic information with respect to each instant segment with information indicative of beats or downbeats in previous segments among the sequence of segments.

Patent Agency Ranking