-
公开(公告)号:US20250078809A1
公开(公告)日:2025-03-06
申请号:US18951397
申请日:2024-11-18
Applicant: Google LLC
Inventor: Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:US12165032B2
公开(公告)日:2024-12-10
申请号:US16586813
申请日:2019-09-27
Applicant: Google LLC
Inventor: Yang Li , Lukasz Mieczyslaw Kaiser , Samuel Bengio , Si Si
IPC: G06N3/045 , G06F16/903 , G06N3/10 , G06F16/00 , G06F40/00
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing an area attention layer in a neural network system. The area attention layer area implements a way for a neural network model to attend to areas in the memory, where each area contains a group of items that are structurally adjacent.
-
公开(公告)号:US12148433B2
公开(公告)日:2024-11-19
申请号:US18485069
申请日:2023-10-11
Applicant: Google LLC
Inventor: Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US20210366491A1
公开(公告)日:2021-11-25
申请号:US17444384
申请日:2021-08-03
Applicant: Google LLC
Inventor: Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US10714078B2
公开(公告)日:2020-07-14
申请号:US16171629
申请日:2018-10-26
Applicant: Google LLC
Inventor: Samuel Bengio , Mirkó Visontai , Christopher Walter George Thornton , Michiel A. U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
-
公开(公告)号:US10692003B2
公开(公告)日:2020-06-23
申请号:US16445330
申请日:2019-06-19
Applicant: Google LLC
Inventor: Samuel Bengio , Mohammad Norouzi , Benoit Steiner , Jeffrey Adgate Dean , Hieu Hy Pham , Azalia Mirhoseini , Quoc V. Le , Naveen Kumar , Yuefeng Zhou , Rasmus Munk Larsen
Abstract: A method for determining a placement for machine learning model operations across multiple hardware devices is described. The method includes receiving data specifying a machine learning model to be placed for distributed processing on multiple hardware devices; generating, from the data, a sequence of operation embeddings, each operation embedding in the sequence characterizing respective operations necessary to perform the processing of the machine learning model; processing the sequence of operation embeddings using a placement recurrent neural network in accordance with first values of a plurality network parameters of the placement recurrent neural network to generate a network output that defines a placement of the operations characterized by the operation embeddings in the sequence across the plurality of devices; and scheduling the machine learning model for processing by the multiple hardware devices by placing the operations on the multiple devices according to the placement defined by the network output.
-
公开(公告)号:US20200160869A1
公开(公告)日:2020-05-21
申请号:US16752007
申请日:2020-01-24
Applicant: Google LLC
Inventor: Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US20180342238A1
公开(公告)日:2018-11-29
申请号:US16055414
申请日:2018-08-06
Applicant: Google LLC
Inventor: Navdeep Jaitly , Quoc V. Le , Oriol Vinyals , Samuel Bengio , Ilya Sutskever
CPC classification number: G10L15/16 , G05B13/027 , G06F17/276 , G06F17/289 , G06N3/0445 , G10L15/02 , G10L15/26 , G10L2015/025
Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.
-
公开(公告)号:US10043512B2
公开(公告)日:2018-08-07
申请号:US15349245
申请日:2016-11-11
Applicant: GOOGLE LLC
Inventor: Navdeep Jaitly , Quoc V. Le , Oriol Vinyals , Samuel Bengio , Ilya Sutskever
Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.
-
公开(公告)号:US12190860B2
公开(公告)日:2025-01-07
申请号:US18516069
申请日:2023-11-21
Applicant: Google LLC
Inventor: Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
IPC: G10L13/06 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L13/08 , G10L15/16 , G10L25/18 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
-
-
-
-
-
-
-
-