-
公开(公告)号:US11557277B2
公开(公告)日:2023-01-17
申请号:US17644362
申请日:2021-12-15
Applicant: Google LLC
Inventor: Georg Heigold , Erik McDermott , Vincent O. VanHoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US10930270B2
公开(公告)日:2021-02-23
申请号:US16541982
申请日:2019-08-15
Applicant: Google LLC
Inventor: Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.
-
公开(公告)号:US10923112B2
公开(公告)日:2021-02-16
申请号:US16704799
申请日:2019-12-05
Applicant: Google LLC
Inventor: Hasim Sak , Andrew W. Senior
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
-
公开(公告)号:US20210005184A1
公开(公告)日:2021-01-07
申请号:US17022224
申请日:2020-09-16
Applicant: Google LLC
Inventor: Kanury Kanishka Rao , Andrew W. Senior , Hasim Sak
IPC: G10L15/16 , G10L15/30 , G10L15/187
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
-
公开(公告)号:US10803855B1
公开(公告)日:2020-10-13
申请号:US16258309
申请日:2019-01-25
Applicant: Google LLC
Inventor: Kanury Kanishka Rao , Andrew W. Senior , Hasim Sak
IPC: G10L15/16 , G10L15/30 , G10L15/187 , G10L15/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
-
公开(公告)号:US10783900B2
公开(公告)日:2020-09-22
申请号:US14847133
申请日:2015-09-08
Applicant: Google LLC
Inventor: Tara N. Sainath , Andrew W. Senior , Oriol Vinyals , Hasim Sak
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.
-
公开(公告)号:US10535338B2
公开(公告)日:2020-01-14
申请号:US16179801
申请日:2018-11-02
Applicant: Google LLC
Inventor: Hasim Sak , Andrew W. Senior
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
-
公开(公告)号:US10229672B1
公开(公告)日:2019-03-12
申请号:US15397327
申请日:2017-01-03
Applicant: Google LLC
Inventor: Kanury Kanishka Rao , Andrew W. Senior , Hasim Sak
IPC: G10L15/00 , G10L15/16 , G10L15/30 , G10L15/187 , G10L15/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
-
公开(公告)号:US20180130474A1
公开(公告)日:2018-05-10
申请号:US15810516
申请日:2017-11-13
Applicant: Google LLC
Inventor: Hasim Sak , Andrew W. Senior
CPC classification number: G10L17/14 , G06N3/0445 , G10L15/02 , G10L15/16 , G10L2015/025
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.
-
-
-
-
-
-
-
-