ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

    公开(公告)号:US20240087559A1

    公开(公告)日:2024-03-14

    申请号:US18506540

    申请日:2023-11-10

    Applicant: Google LLC

    CPC classification number: G10L15/063 G06N3/045 G10L15/16 G10L15/183

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

    Adaptive audio enhancement for multichannel speech recognition

    公开(公告)号:US11257485B2

    公开(公告)日:2022-02-22

    申请号:US16708930

    申请日:2019-12-10

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    Asynchronous optimization for sequence training of neural networks

    公开(公告)号:US10916238B2

    公开(公告)日:2021-02-09

    申请号:US16863432

    申请日:2020-04-30

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

    Hotword suppression
    17.
    发明授权

    公开(公告)号:US10692496B2

    公开(公告)日:2020-06-23

    申请号:US16418415

    申请日:2019-05-21

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.

    Asynchronous optimization for sequence training of neural networks

    公开(公告)号:US11854534B1

    公开(公告)日:2023-12-26

    申请号:US18069035

    申请日:2022-12-20

    Applicant: Google LLC

    CPC classification number: G10L15/063 G06N3/045 G10L15/06 G10L15/183

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Patent Agency Ranking