Patent search ap:("Google LLC") AND inv:"Navdeep Jaitly" Page 4

31.

发明申请
VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR END-TO-END SPEECH RECOGNITION 审中-公开

公开(公告)号：US20190236451A1

公开(公告)日：2019-08-01

申请号：US16380101

申请日：2019-04-10

Applicant: Google LLC

Inventor： Navdeep Jaitly , Yu Zhang , William Chan

IPC: G06N3/08 , G06N3/04 , G10L15/16 , G10L15/02 , G10L15/22

Abstract: A speech recognition neural network system includes an encoder neural network and a decoder neural network. The encoder neural network generates an encoded sequence from an input acoustic sequence that represents an utterance. The input acoustic sequence includes a respective acoustic feature representation at each of a plurality of input time steps, the encoded sequence includes a respective encoded representation at each of a plurality of time reduced time steps, and the number of time reduced time steps is less than the number of input time steps. The encoder neural network includes a time reduction subnetwork, a convolutional LSTM subnetwork, and a network in network subnetwork. The decoder neural network receives the encoded sequence and processes the encoded sequence to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings.

32.

发明申请
GENERATING TARGET SEQUENCES FROM INPUT SEQUENCES USING PARTIAL CONDITIONING 审中-公开

公开(公告)号：US20180342238A1

公开(公告)日：2018-11-29

申请号：US16055414

申请日：2018-08-06

Applicant: Google LLC

Inventor： Navdeep Jaitly , Quoc V. Le , Oriol Vinyals , Samuel Bengio , Ilya Sutskever

IPC: G10L15/16 , G06N3/04 , G10L15/26 , G10L15/02 , G05B13/02 , G06F17/28 , G06F17/27

CPC classification number: G10L15/16 , G05B13/027 , G06F17/276 , G06F17/289 , G06N3/0445 , G10L15/02 , G10L15/26 , G10L2015/025

Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.

33.

发明授权
Generating target sequences from input sequences using partial conditioning 有权

公开(公告)号：US10043512B2

公开(公告)日：2018-08-07

申请号：US15349245

申请日：2016-11-11

Applicant: GOOGLE LLC

Inventor： Navdeep Jaitly , Quoc V. Le , Oriol Vinyals , Samuel Bengio , Ilya Sutskever

IPC: G06F15/00 , G10L15/16 , G06F17/28 , G10L15/02 , G05B13/02 , G06N3/04

Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.

34.

发明授权
Sequence modeling using imputation 有权

公开(公告)号：US12242818B2

公开(公告)日：2025-03-04

申请号：US17797872

申请日：2021-02-08

Applicant: Google LLC

Inventor： William Chan , Chitwan Saharia , Geoffrey E. Hinton , Mohammad Norouzi , Navdeep Jaitly

IPC: G06F40/47 , G06F40/284

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

35.

发明授权
End-to-end text-to-speech conversion 有权

公开(公告)号：US12190860B2

公开(公告)日：2025-01-07

申请号：US18516069

申请日：2023-11-21

Applicant: Google LLC

Inventor： Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao

IPC: G10L13/06 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L13/08 , G10L15/16 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

36.

发明授权
Synthesizing speech from text using neural networks 有权

公开(公告)号：US12148444B2

公开(公告)日：2024-11-19

申请号：US17222736

申请日：2021-04-05

Applicant: Google LLC

Inventor： Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis

IPC: G10L13/047 , G06N3/045 , G06N3/08 , G06N5/046 , G06N7/01 , G10L13/08 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

37.

发明公开
END-TO-END TEXT-TO-SPEECH CONVERSION 审中-公开

公开(公告)号：US20240127791A1

公开(公告)日：2024-04-18

申请号：US18516069

申请日：2023-11-21

Applicant: Google LLC

Inventor： Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao

IPC: G10L13/08 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L15/16 , G10L25/18 , G10L25/30

CPC classification number: G10L13/08 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L15/16 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

38.

发明申请
GENERATING OUTPUT SEQUENCES FROM INPUT SEQUENCES USING NEURAL NETWORKS 有权

公开(公告)号：US20220138531A1

公开(公告)日：2022-05-05

申请号：US17575394

申请日：2022-01-13

Applicant: Google LLC

Inventor： Oriol Vinyals , Navdeep Jaitly

IPC: G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

39.

发明授权
Generating output sequences from input sequences using neural networks 有权

公开(公告)号：US11227206B1

公开(公告)日：2022-01-18

申请号：US16552495

申请日：2019-08-27

Applicant: Google LLC

Inventor： Oriol Vinyals , Navdeep Jaitly

IPC: G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

40.

发明授权
Processing text sequences using neural networks 有权

公开(公告)号：US11182566B2

公开(公告)日：2021-11-23

申请号：US16338174

申请日：2017-10-03

Applicant: Google LLC

Inventor： Navdeep Jaitly , Yu Zhang , Quoc V. Le , William Chan

IPC: G06F40/47 , G06N3/08 , G10L15/16 , G10L15/197

Abstract: A computer-implemented method for training a neural network that is configured to generate a score distribution over a set of multiple output positions. The neural network is configured to process a network input to generate a respective score distribution for each of a plurality of output positions including a respective score for each token in a predetermined set of tokens that includes n-grams of multiple different sizes. Example methods described herein provide trained neural networks which produce results with improved accuracy compared to the state of the art, e.g. translations that are more accurate compared to the state of the art, or more accurate speech recognition compared to the state of the art.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification