-
公开(公告)号:US11501787B2
公开(公告)日:2022-11-15
申请号:US16548146
申请日:2019-08-22
申请人: Google LLC
IPC分类号: G10L19/035 , G06N20/00 , G10L19/038 , G10L25/18
摘要: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
-
公开(公告)号:US11756530B2
公开(公告)日:2023-09-12
申请号:US17640579
申请日:2020-09-25
申请人: GOOGLE LLC
发明人: Marco Tagliasacchi , Mihajlo Velimirovic , Matthew Sharifi , Dominik Roblek , Christian Frank , Beat Gfeller
IPC分类号: G10L15/06 , G10L21/013 , G10L25/30 , G10L25/90
CPC分类号: G10L15/063 , G10L21/013 , G10L25/30 , G10L25/90
摘要: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.
-
公开(公告)号:US20210056980A1
公开(公告)日:2021-02-25
申请号:US16548146
申请日:2019-08-22
申请人: Google LLC
IPC分类号: G10L19/035 , G10L25/18 , G10L19/038 , G06N20/00
摘要: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
-
4.
公开(公告)号:US20190102458A1
公开(公告)日:2019-04-04
申请号:US16148338
申请日:2018-10-01
申请人: Google LLC
发明人: Dominik Roblek , Blaise Aguera-Arcas , Tom Hume , Marvin Ritter , Brandon Barbello , Kevin Kilgour , Mihajlo Velimirovic , Christopher Walter George Thornton , Gabriel Taubman , James David Lyon , Jan Athaus , Katsiaryna Naliuka , Julian Odell , Matthew Sharifi , Beat Gfeller
摘要: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.
-
公开(公告)号:US20190102144A1
公开(公告)日:2019-04-04
申请号:US16148401
申请日:2018-10-01
申请人: Google LLC
发明人: Dominik Roblek , Blaise Aguera-Arcas , Tom Hume , Marvin Ritter , Brandon Barbello , Kevin Kilgour , Mihajlo Velimirovic , Christopher Walter George Thornton , Gabriel Taubman , James David Lyon , Jan Althaus , Katsiaryna Naliuka , Julian Odell , Matthew Sharifi , Beat Gfeller
摘要: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for indicating a reference song. A computing device stores reference song characterization data that identifies a plurality of audio characteristics for each reference song in a plurality of reference songs. The computing device receives digital audio data that represents audio recorded by a microphone, converts the digital audio data from time-domain format into frequency-domain format, and uses the digital audio data in the frequency-domain format in a music-characterization process. In response to determining that characterization values for the digital audio data are most relevant to characterization values for a particular reference song, the computing device outputs an indication of the particular reference song.
-
公开(公告)号:US20230419989A1
公开(公告)日:2023-12-28
申请号:US17808653
申请日:2022-06-24
申请人: Google LLC
发明人: Beat Gfeller , Kevin Ian Kilgour , Marco Tagliasacchi , Aren Jansen , Scott Thomas Wisdom , Qingqing Huang
CPC分类号: G10L25/84 , G10L15/16 , G10L15/063 , G06N3/0454
摘要: Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.
-
公开(公告)号:US20230085596A1
公开(公告)日:2023-03-16
申请号:US17986477
申请日:2022-11-14
申请人: Google LLC
IPC分类号: G10L19/035 , G06N20/00 , G10L19/038 , G10L25/18
摘要: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
-
公开(公告)号:US20230013370A1
公开(公告)日:2023-01-19
申请号:US17856292
申请日:2022-07-01
申请人: Google LLC
发明人: Yunpeng Li , Marco Tagliasacchi , Dominik Roblek , Félix de Chaumont Quitry , Beat Gfeller , Hannah Raphaelle Muckenhirn , Victor Ungureanu , Oleg Rybakov , Karolis Misiunas , Zalán Borsos
IPC分类号: G10L19/022 , G06N3/04
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
-
公开(公告)号:US20230395087A1
公开(公告)日:2023-12-07
申请号:US18249126
申请日:2021-10-15
申请人: Google LLC
发明人: Marco Tagliasacchi , Beat Gfeller , Yunpeng Li , Zalán Borsos
IPC分类号: G10L21/007 , G10L15/06 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21
CPC分类号: G10L21/007 , G10L15/063 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21 , G10L2015/088
摘要: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.
-
10.
公开(公告)号:US11256472B2
公开(公告)日:2022-02-22
申请号:US17010694
申请日:2020-09-02
申请人: Google LLC
发明人: Dominik Roblek , Blaise Hilary Aguera-Arcas , Thomas W. Hume , Marvin Karl Ritter , Brandon Charles Barbello , Kevin I. Kilgour , Mihajlo Velimirović , Christopher Thornton , Gabriel Oak Taubman , James David Lyon , Jan Heinrich Althaus , Katsiaryna Naliuka , Julian James Odell , Matthew Sharifi , Beat Gfeller
IPC分类号: G06F3/16 , G06F16/635 , G06F16/683 , G06N3/08 , G06N20/00
摘要: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.
-
-
-
-
-
-
-
-
-