Patent search ap:("GOOGLE LLC") AND inv:"Kevin William Wilson" Page 1

1.

发明授权
Enhanced multi-channel acoustic models 有权

公开(公告)号：US11783849B2

公开(公告)日：2023-10-10

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L15/16 , G10L25/30 , G10L21/028 , G10L21/0388 , G10L19/008 , G10L15/20 , G10L21/0208 , G10L21/0216

CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

2.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 审中-公开

公开(公告)号：US20190259409A1

公开(公告)日：2019-08-22

申请号：US16278830

申请日：2019-02-19

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L19/008 , G10L15/20 , G10L15/16 , G10L21/0388

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

3.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11756534B2

公开(公告)日：2023-09-12

申请号：US17649058

申请日：2022-01-26

Applicant: Google LLC

Inventor： Bo Li , Ron Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

CPC classification number: G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L2021/02166

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

4.

发明授权
Processing audio waveforms 有权

公开(公告)号：US10930270B2

公开(公告)日：2021-02-23

申请号：US16541982

申请日：2019-08-15

Applicant: Google LLC

Inventor： Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G06N3/04 , G06N3/08 , G10L15/26 , G10L15/14

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

5.

发明授权
Multichannel raw-waveform neural networks 有权

公开(公告)号：US10339921B2

公开(公告)日：2019-07-02

申请号：US14987146

申请日：2016-01-04

Applicant: Google LLC

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/34 , G06N3/04 , G06N3/08 , G10L15/20 , G10L21/0208

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

6.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20180197534A1

公开(公告)日：2018-07-12

申请号：US15848829

申请日：2017-12-20

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/16 , G10L21/0224 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

7.

发明授权
Audio-visual speech separation 有权

公开(公告)号：US11894014B2

公开(公告)日：2024-02-06

申请号：US17951002

申请日：2022-09-22

Applicant: Google LLC

Inventor： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC: G10L25/57 , G10L15/16 , G10L21/10 , G10L21/18 , G06V20/40 , G06V40/16 , G10L15/25 , G06F18/214 , G10L17/18

CPC classification number: G10L25/57 , G06F18/214 , G06V20/41 , G06V40/161 , G10L15/16 , G10L15/25 , G10L17/18 , G10L21/10 , G10L21/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

8.

发明申请
AUDIO-VISUAL SPEECH SEPARATION 有权

公开(公告)号：US20230122905A1

公开(公告)日：2023-04-20

申请号：US17951002

申请日：2022-09-22

Applicant: Google LLC

Inventor： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC: G10L21/10 , G10L15/16 , G10L21/18 , G06V20/40 , G06V40/16 , G10L15/25 , G06F18/214

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

9.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11257485B2

公开(公告)日：2022-02-22

申请号：US16708930

申请日：2019-12-10

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

10.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 有权

公开(公告)号：US20210295859A1

公开(公告)日：2021-09-23

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L21/0388 , G10L15/16 , G10L19/008 , G10L15/20

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification