-
公开(公告)号:US12073823B2
公开(公告)日:2024-08-27
申请号:US18506540
申请日:2023-11-10
Applicant: Google LLC
Inventor: Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC: G10L15/06 , G06N3/045 , G10L15/16 , G10L15/183
CPC classification number: G10L15/063 , G06N3/045 , G10L15/16 , G10L15/183
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US20230206909A1
公开(公告)日:2023-06-29
申请号:US18177717
申请日:2023-03-02
Applicant: Google LLC
Inventor: Andrew W. Senior , Ignacio L. Moreno
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.
-
公开(公告)号:US11620991B2
公开(公告)日:2023-04-04
申请号:US17154376
申请日:2021-01-21
Applicant: Google LLC
Inventor: Andrew W. Senior , Ignacio L. Moreno
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.
-
公开(公告)号:US20220108686A1
公开(公告)日:2022-04-07
申请号:US17644362
申请日:2021-12-15
Applicant: Google LLC
Inventor: Georg Heigold , Erik McDermott , Vincent O. VanHoucke , Andrew W. Senior , Michiel A.U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US11227582B2
公开(公告)日:2022-01-18
申请号:US17143140
申请日:2021-01-06
Applicant: Google LLC
Inventor: Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US10714120B2
公开(公告)日:2020-07-14
申请号:US16017580
申请日:2018-06-25
Applicant: Google LLC
Inventor: Dave Burke , Michael J. Lebeau , Konrad Gianno , Trausti T. Kristjansson , John Nicholas Jitkoff , Andrew W. Senior
IPC: G10L15/26 , G10L25/78 , G10L15/10 , G06F3/16 , G06F3/0346 , H04M1/725 , H04R1/08 , H04W4/02 , G10L17/00 , G10L15/22 , G10L25/21
Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
-
公开(公告)号:US10134393B2
公开(公告)日:2018-11-20
申请号:US15664153
申请日:2017-07-31
Applicant: Google LLC
Inventor: Hasim Sak , Andrew W. Senior
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
-
公开(公告)号:US10026419B2
公开(公告)日:2018-07-17
申请号:US14645802
申请日:2015-03-12
Applicant: Google LLC
Inventor: Dave Burke , Michael J. LeBeau , Konrad Gianno , Trausti T. Kristjansson , John Nicholas Jitkoff , Andrew W. Senior
Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
-
公开(公告)号:US11341958B2
公开(公告)日:2022-05-24
申请号:US17022224
申请日:2020-09-16
Applicant: Google LLC
Inventor: Kanury Kanishka Rao , Andrew W. Senior , Hasim Sak
IPC: G10L15/16 , G10L15/187 , G10L15/30 , G10L15/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
-
公开(公告)号:US20210183376A1
公开(公告)日:2021-06-17
申请号:US17154376
申请日:2021-01-21
Applicant: Google LLC
Inventor: Andrew W. Senior , Ignacio L. Moreno
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.
-
-
-
-
-
-
-
-
-