-
公开(公告)号:US20180025721A1
公开(公告)日:2018-01-25
申请号:US15217457
申请日:2016-07-22
Applicant: Google Inc.
Inventor: Bo Li , Tara N. Sainath
CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/26 , G10L2015/025
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for automatic speech recognition using multi-dimensional models. In some implementations, audio data that describes an utterance is received. A transcription for the utterance is determined using an acoustic model that includes a neural network having first memory blocks for time information and second memory blocks for frequency information. The transcription for the utterance is provided as output of an automated speech recognizer.
-
公开(公告)号:US09886949B2
公开(公告)日:2018-02-06
申请号:US15392122
申请日:2016-12-28
Applicant: Google Inc.
Inventor: Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson
IPC: G10L15/00 , G10L15/16 , G10L21/0224 , G10L21/0216 , G10L15/26
CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
-
公开(公告)号:US20170278513A1
公开(公告)日:2017-09-28
申请号:US15392122
申请日:2016-12-28
Applicant: Google Inc.
Inventor: Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson
IPC: G10L15/16 , G10L21/0224
CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
-
-