-
公开(公告)号:US20230121764A1
公开(公告)日:2023-04-20
申请号:US17502890
申请日:2021-10-15
Applicant: Lemon Inc.
Inventor: Ju-Chiang Wang , Jordan Smith , Wei Tsung Lu
Abstract: Devices, systems, and methods related to implementing supervised metric learning during a training of a deep neural network model are disclosed herein. In examples, audio input may be received, where the audio input includes a plurality of song fragments from a plurality of songs. For each song fragment, an aligning function may be performed to center the song fragment based on determined beat information, thereby creating a plurality of aligned song fragments. For each song fragment of the plurality of song fragments, an embedding vector may be obtained from the deep neural network. Thus, a batch of aligned song fragments from the plurality of aligned song fragments may be selected, such that a training tuple may be selected. A loss metric may be generated based on the selected training tuple and one or more weights of the deep neural network model may be updated based on the loss metric.
-
公开(公告)号:US20230124006A1
公开(公告)日:2023-04-20
申请号:US17502863
申请日:2021-10-15
Applicant: Lemon Inc.
Inventor: Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song
Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.
-
公开(公告)号:US12106740B2
公开(公告)日:2024-10-01
申请号:US17502890
申请日:2021-10-15
Applicant: Lemon Inc.
Inventor: Ju-Chiang Wang , Jordan Smith , Wei Tsung Lu
CPC classification number: G10H1/0008 , G06N3/08 , G10H2210/076 , G10H2250/311
Abstract: Devices, systems, and methods related to implementing supervised metric learning during a training of a deep neural network model are disclosed herein. In examples, audio input may be received, where the audio input includes a plurality of song fragments from a plurality of songs. For each song fragment, an aligning function may be performed to center the song fragment based on determined beat information, thereby creating a plurality of aligned song fragments. For each song fragment of the plurality of song fragments, an embedding vector may be obtained from the deep neural network. Thus, a batch of aligned song fragments from the plurality of aligned song fragments may be selected, such that a training tuple may be selected. A loss metric may be generated based on the selected training tuple and one or more weights of the deep neural network model may be updated based on the loss metric.
-
公开(公告)号:US11854558B2
公开(公告)日:2023-12-26
申请号:US17502863
申请日:2021-10-15
Applicant: Lemon Inc.
Inventor: Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song
Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.
-
公开(公告)号:US20230386437A1
公开(公告)日:2023-11-30
申请号:US17804198
申请日:2022-05-26
Applicant: Lemon Inc.
Inventor: Ju-Chiang Wang , Yun-Ning Hung , Jordan Smith
CPC classification number: G10H1/0008 , G06N3/08 , G10H2210/056
Abstract: System and methods directed to identifying music theory labels for audio tracks are described. More specifically, a first training set of audio portions may be generated from a plurality of audio tracks, segments within the plurality of audio tracks being labeled according to a plurality of music theory labels. A deep neural network model may then be trained using the first training set as an input, a first loss function for music theory label identifications of audio portions of the first training set, and a second loss function for segment boundary identifications within the audio portions of the first training set. In examples, the music theory label identifications and the segment boundary identifications are generated by the deep neural network model. A first audio track is received and segment boundary identifications and music theory labels for segments within the first audio track are generated using the deep neural network model.
-
-
-
-