-
公开(公告)号:US20240079001A1
公开(公告)日:2024-03-07
申请号:US18463196
申请日:2023-09-07
Applicant: Google LLC
Inventor: Andrea Agostinelli , Timo Immanuel Denk , Antoine Caillon , Neil Zeghidour , Jesse Engel , Mauro Verzetti , Christian Frank , Zalán Borsos , Matthew Sharifi , Adam Joseph Roberts
CPC classification number: G10L15/16 , G10H1/0008 , G10L15/063 , G10L15/1815 , G10H2210/056 , G10H2250/311
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.
-
公开(公告)号:US20240233713A1
公开(公告)日:2024-07-11
申请号:US18412394
申请日:2024-01-12
Applicant: Google LLC
Inventor: Andrea Agostinelli , Timo Immanuel Denk , Antoine Caillon , Neil Zeghidour , Jesse Engel , Mauro Verzetti , Christian Frank , Zalán Borsos , Matthew Sharifi , Adam Joseph Roberts , Marco Tagliasacchi
IPC: G10L15/16 , G06N3/0455 , G06N3/0475 , G10H1/00 , G10L15/06 , G10L15/18
CPC classification number: G10L15/16 , G06N3/0455 , G06N3/0475 , G10H1/0008 , G10L15/063 , G10L15/1815 , G10H2210/056 , G10H2250/311
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.
-
公开(公告)号:US12014276B2
公开(公告)日:2024-06-18
申请号:US18219555
申请日:2023-07-07
Applicant: Google LLC
Inventor: Gaurav Mishra , Adam Joseph Roberts , Maarten Paul Bosma , Noam M. Shazeer, Jr.
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model using a deterministic data pipeline. One of the methods may include receiving a first request to generate a deterministic training dataset: transforming raw training examples obtained from the raw data source into pre-processed training examples; assigning a unique index to each pre-processed training example; and caching the pre-processed training examples into the cache directory specified in the received first request; receiving a second request to use the deterministic training dataset to train a machine learning model, the second request specifying a start index; and in response to receiving the second request: reading, from the cache directory, the pre-processed training examples that have indices beginning from the start index; and providing the read training examples in an order of the assigned indices for use in training the machine learning model.
-
公开(公告)号:US20240395233A1
公开(公告)日:2024-11-28
申请号:US18671577
申请日:2024-05-22
Applicant: Google LLC
Inventor: Adam Joseph Roberts , Jesse Hart Engel , Ian Stuart Simon , Andrea Agostinelli , Neil Zeghidour , Christopher James Donahue , Antoine Caillon
IPC: G10H1/00 , G10H1/36 , G10L15/06 , G10L15/18 , G10L15/183
Abstract: Training data comprising a plurality of training pairs is obtained. Each training pair comprises instrumental audio data and vocal audio data separated from audio data of a musical work of a respective plurality of musical works. For one or more training pairs of the plurality of training pairs, the vocal audio data is processed with machine-learned model(s) of a machine-learned generative audio model grouping to obtain a vocal intermediate representation for the vocal audio data. The instrumental audio data is processed with a pre-trained encoding model to obtain an instrumental intermediate representation for the instrumental audio data. A loss function is evaluated that evaluates a difference between the vocal intermediate representation and the instrumental intermediate representation. Values of parameters of a machine-learned model of the machine-learned generative audio model grouping are modified based on the loss function.
-
公开(公告)号:US20230351190A1
公开(公告)日:2023-11-02
申请号:US18219555
申请日:2023-07-07
Applicant: Google LLC
Inventor: Gaurav Mishra , Adam Joseph Roberts , Noam M. Shazeer, JR. , Maarten Paul Bosma
IPC: G06N3/084
CPC classification number: G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model using a deterministic data pipeline. One of the methods may include receiving a first request to generate a deterministic training dataset: transforming raw training examples obtained from the raw data source into pre-processed training examples; assigning a unique index to each pre-processed training example; and caching the pre-processed training examples into the cache directory specified in the received first request; receiving a second request to use the deterministic training dataset to train a machine learning model, the second request specifying a start index; and in response to receiving the second request: reading, from the cache directory, the pre-processed training examples that have indices beginning from the start index; and providing the read training examples in an order of the assigned indices for use in training the machine learning model.
-
公开(公告)号:US11915689B1
公开(公告)日:2024-02-27
申请号:US18463196
申请日:2023-09-07
Applicant: Google LLC
Inventor: Andrea Agostinelli , Timo Immanuel Denk , Antoine Caillon , Neil Zeghidour , Jesse Engel , Mauro Verzetti , Christian Frank , Zalán Borsos , Matthew Sharifi , Adam Joseph Roberts , Marco Tagliasacchi
CPC classification number: G10L15/16 , G10H1/0008 , G10L15/063 , G10L15/1815 , G10H2210/056 , G10H2250/311
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.
-
公开(公告)号:US20230316082A1
公开(公告)日:2023-10-05
申请号:US18130339
申请日:2023-04-03
Applicant: Google LLC
Inventor: Gaurav Mishra , Adam Joseph Roberts , Noam M. Shazeer, JR. , Maarten Paul Bosma
IPC: G06N3/084
CPC classification number: G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model using a deterministic data pipeline. One of the methods may include receiving a first request to generate a deterministic training dataset: transforming raw training examples obtained from the raw data source into pre-processed training examples; assigning a unique index to each pre-processed training example; and caching the pre-processed training examples into the cache directory specified in the received first request; receiving a second request to use the deterministic training dataset to train a machine learning model, the second request specifying a start index; and in response to receiving the second request: reading, from the cache directory, the pre-processed training examples that have indices beginning from the start index; and providing the read training examples in an order of the assigned indices for use in training the machine learning model.
-
-
-
-
-
-