-
公开(公告)号:US20200027444A1
公开(公告)日:2020-01-23
申请号:US16516390
申请日:2019-07-19
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
公开(公告)号:US20190311708A1
公开(公告)日:2019-10-10
申请号:US16447862
申请日:2019-06-20
Applicant: Google LLC
Inventor: Samy Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:US10281885B1
公开(公告)日:2019-05-07
申请号:US15600699
申请日:2017-05-19
Applicant: Google LLC
Inventor: Chung-Cheng Chiu , Navdeep Jaitly , Ilya Sutskever , Yuping Luo
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive am input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.
-
公开(公告)号:US20240420686A1
公开(公告)日:2024-12-19
申请号:US18815200
申请日:2024-08-26
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
公开(公告)号:US11954594B1
公开(公告)日:2024-04-09
申请号:US17315695
申请日:2021-05-10
Applicant: Google LLC
Inventor: Samy Bengio , Oriol Vinyals , Navdeep Jaitly , Noam M. Shazeer
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: This document generally describes a neural network training system, including one or more computers, that trains a recurrent neural network (RNN) to receive an input, e.g., an input sequence, and to generate a sequence of outputs from the input sequence. In some implementations, training can include, for each position after an initial position in a training target sequence, selecting a preceding output of the RNN to provide as input to the RNN at the position, including determining whether to select as the preceding output (i) a true output in a preceding position in the output order or (ii) a value derived from an output of the RNN for the preceding position in an output order generated in accordance with current values of the parameters of the recurrent neural network.
-
公开(公告)号:US11862142B2
公开(公告)日:2024-01-02
申请号:US17391799
申请日:2021-08-02
Applicant: Google LLC
Inventor: Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
IPC: G10L13/06 , G10L13/08 , G06N3/08 , G10L25/18 , G10L25/30 , G10L13/04 , G06N3/084 , G10L15/16 , G06N3/045
CPC classification number: G10L13/08 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L15/16 , G10L25/18 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:US11763936B2
公开(公告)日:2023-09-19
申请号:US17112279
申请日:2020-12-04
Applicant: Google LLC
Inventor: Christopher S. Co , Navdeep Jaitly , Lily Hao Yi Peng , Katherine Irene Chou , Ananth Sankar
IPC: G10L15/26 , G16H40/20 , G10L15/18 , G06F40/47 , G06F40/58 , G10L15/06 , G10L15/14 , G10L15/16 , G10L15/183
CPC classification number: G16H40/20 , G06F40/47 , G06F40/58 , G10L15/063 , G10L15/142 , G10L15/16 , G10L15/183 , G10L15/1822 , G10L15/26
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.
-
公开(公告)号:US11195521B2
公开(公告)日:2021-12-07
申请号:US16781273
申请日:2020-02-04
Applicant: Google LLC
Inventor: Navdeep Jaitly , Quoc V. Le , Oriol Vinyals , Samuel Bengio , Ilya Sutskever
IPC: G10L15/00 , G10L15/16 , G06N3/04 , G10L15/26 , G06F40/58 , G06F40/274 , G06F40/55 , G10L15/02 , G05B13/02
Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.
-
公开(公告)号:US20200251099A1
公开(公告)日:2020-08-06
申请号:US16781273
申请日:2020-02-04
Applicant: Google LLC
Inventor: Navdeep Jaitly , Quoc V. Le , Oriol Vinyals , Samuel Bengio , Ilya Sutskever
Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.
-
公开(公告)号:US20200151544A1
公开(公告)日:2020-05-14
申请号:US16610466
申请日:2018-05-03
Applicant: GOOGLE LLC
Inventor: Chung-Cheng Chiu , Navdeep Jaitly , John Dieterich Lawson , George Jay Tucker
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive an input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.
-
-
-
-
-
-
-
-
-