Compressed recurrent neural network models

    公开(公告)号:US11741366B2

    公开(公告)日:2023-08-29

    申请号:US16726119

    申请日:2019-12-23

    Applicant: Google LLC

    CPC classification number: G06N3/084 G06N3/044

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

    Contextual Biasing for Speech Recognition

    公开(公告)号:US20220101836A1

    公开(公告)日:2022-03-31

    申请号:US17643423

    申请日:2021-12-09

    Applicant: Google LLC

    Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

    Proper Noun Recognition in End-to-End Speech Recognition

    公开(公告)号:US20210233512A1

    公开(公告)日:2021-07-29

    申请号:US17150491

    申请日:2021-01-15

    Applicant: Google LLC

    Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

    Processing audio waveforms
    56.
    发明授权

    公开(公告)号:US10930270B2

    公开(公告)日:2021-02-23

    申请号:US16541982

    申请日:2019-08-15

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

    Joint Endpointing And Automatic Speech Recognition

    公开(公告)号:US20200335091A1

    公开(公告)日:2020-10-22

    申请号:US16809403

    申请日:2020-03-04

    Applicant: Google LLC

    Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.

    MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS

    公开(公告)号:US20200043483A1

    公开(公告)日:2020-02-06

    申请号:US16529252

    申请日:2019-08-01

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Patent Agency Ranking