Patent search ap:("Google LLC") AND inv:"Ron J. Weiss" Page 3

21.

发明申请
END-TO-END TEXT-TO-SPEECH CONVERSION 有权

公开(公告)号：US20250078809A1

公开(公告)日：2025-03-06

申请号：US18951397

申请日：2024-11-18

Applicant: Google LLC

Inventor： Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao

IPC: G10L13/08 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L15/16 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

22.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US12106749B2

公开(公告)日：2024-10-01

申请号：US17448119

申请日：2021-09-20

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/00 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22 , G10L25/30 , G10L15/26

CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/30 , G10L2015/025 , G10L15/26

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

23.

发明授权
Synthesis of speech from text in a voice of a target speaker using neural networks 有权

公开(公告)号：US11848002B2

公开(公告)日：2023-12-19

申请号：US17813361

申请日：2022-07-19

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L13/02

CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L2013/021

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

24.

发明公开
END-TO-END SPEECH WAVEFORM GENERATION THROUGH DATA DENSITY GRADIENT ESTIMATION 审中-公开

公开(公告)号：US20230252974A1

公开(公告)日：2023-08-10

申请号：US18010438

申请日：2021-09-02

Applicant: Google LLC

Inventor： Byungha Chun , Mohammad Norouzi , Nanxin Chen , Ron J. Weiss , William Chan , Yu Zhang , Yonghui Wu

IPC: G10L13/08 , G10L21/0208

CPC classification number: G10L13/08 , G10L21/0208

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating waveforms conditioned on phoneme sequences. In one aspect, a method comprises: obtaining a phoneme sequence; processing the phoneme sequence using an encoder neural network to generate a hidden representation of the phoneme sequence; generating, from the hidden representation, a conditioning input; initializing a current waveform output; and generating a final waveform output that defines an utterance of the phoneme sequence by a speaker by updating the current waveform output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing (i) the current waveform output and (ii) the conditioning input using a noise estimation neural network to generate a noise output; and updating the current waveform output using the noise output and the noise level for the iteration.

25.

发明申请
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 有权

公开(公告)号：US20220130374A1

公开(公告)日：2022-04-28

申请号：US17572238

申请日：2022-01-10

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/07

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

26.

发明申请
END-TO-END SPEECH CONVERSION 有权

公开(公告)号：US20220122579A1

公开(公告)日：2022-04-21

申请号：US17310732

申请日：2019-11-26

Applicant: Google LLC

Inventor： Fadi Biadsy , Ron J. Weiss , Aleksandar Kracun , Pedro J. Moreno Mengibar

IPC: G10L13/02 , G10L21/10 , G10L25/30 , G06N3/08 , H04L51/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

27.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11257485B2

公开(公告)日：2022-02-22

申请号：US16708930

申请日：2019-12-10

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

28.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US11145293B2

公开(公告)日：2021-10-12

申请号：US16516390

申请日：2019-07-19

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/22 , G10L15/02 , G06N3/08 , G10L15/06 , G10L25/30 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

29.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 有权

公开(公告)号：US20210295859A1

公开(公告)日：2021-09-23

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L21/0388 , G10L15/16 , G10L19/008 , G10L15/20

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

30.

发明授权
Synthesizing speech from text using neural networks 有权

公开(公告)号：US10971170B2

公开(公告)日：2021-04-06

申请号：US16058640

申请日：2018-08-08

Applicant: Google LLC

Inventor： Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis

IPC: G10L25/30 , G10L13/047 , G10L13/08 , G06N7/00 , G06N3/08 , G06N3/04 , G06N5/04 , G10L25/18

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification