-
公开(公告)号:US20240395233A1
公开(公告)日:2024-11-28
申请号:US18671577
申请日:2024-05-22
Applicant: Google LLC
Inventor: Adam Joseph Roberts , Jesse Hart Engel , Ian Stuart Simon , Andrea Agostinelli , Neil Zeghidour , Christopher James Donahue , Antoine Caillon
IPC: G10H1/00 , G10H1/36 , G10L15/06 , G10L15/18 , G10L15/183
Abstract: Training data comprising a plurality of training pairs is obtained. Each training pair comprises instrumental audio data and vocal audio data separated from audio data of a musical work of a respective plurality of musical works. For one or more training pairs of the plurality of training pairs, the vocal audio data is processed with machine-learned model(s) of a machine-learned generative audio model grouping to obtain a vocal intermediate representation for the vocal audio data. The instrumental audio data is processed with a pre-trained encoding model to obtain an instrumental intermediate representation for the instrumental audio data. A loss function is evaluated that evaluates a difference between the vocal intermediate representation and the instrumental intermediate representation. Values of parameters of a machine-learned model of the machine-learned generative audio model grouping are modified based on the loss function.