System and method for training a transformer-in-transformer-based neural network model for audio data

Invention Grant

US11854558B2 System and method for training a transformer-in-transformer-based neural network model for audio data 有权

Please log in to see more content

Patent Title: System and method for training a transformer-in-transformer-based neural network model for audio data
Application No.: US17502863

Application Date: 2021-10-15
Publication No.: US11854558B2

Publication Date: 2023-12-26
Inventor: Wei Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song
Applicant: Lemon Inc.
Applicant Address: KY Grand Cayman
Assignee: Lemon Inc.
Current Assignee: Lemon Inc.
Current Assignee Address: KY Grand Cayman
Agency: Faegre Drinker Biddle & Reath LLP
Main IPC: G10L19/02
IPC: G10L19/02 ; G10L25/30

System and method for training a transformer-in-transformer-based neural network model for audio data

Abstract:

Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein. The processor generates a time-frequency representation of obtained audio data to be applied as input for a transformer-based neural network model; determines spectral embeddings and first temporal embeddings of the audio data based on the time-frequency representation of the audio data; determines each vector of a second frequency class token (FCT) by passing each vector of the first FCT in the spectral embeddings through the spectral transformer; determines second temporal embeddings by adding a linear projection of the second FCT to the first temporal embeddings; determines third temporal embeddings by passing the second temporal embeddings through the temporal transformer; and generates music information based on the third temporal embeddings.

Public/Granted literature

US20230124006A1 SYSTEM AND METHOD FOR TRAINING A TRANSFORMER-IN-TRANSFORMER-BASED NEURAL NETWORK MODEL FOR AUDIO DATA Public/Granted day:2023-04-20

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L19/00	用于冗余度下降情形（例如在声码器中）的语音或音频信号分析-合成技术；语音或音频信号编码或解码，采用源滤波器模型或心理声学分析（乐器中的入G10H）
G10L19/02	.利用频谱分析，例如变换声码器或子频带声码器