Unified Endpointer Using Multitask and Multidomain Learning

    公开(公告)号:US20210142174A1

    公开(公告)日:2021-05-13

    申请号:US17152918

    申请日:2021-01-20

    Applicant: Google LLC

    Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

    Adaptive audio enhancement for multichannel speech recognition

    公开(公告)号:US10515626B2

    公开(公告)日:2019-12-24

    申请号:US15848829

    申请日:2017-12-20

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    MEMORY CELL FOR A PIXEL OF A DISPLAY

    公开(公告)号:US20250037644A1

    公开(公告)日:2025-01-30

    申请号:US18655071

    申请日:2024-05-03

    Applicant: GOOGLE LLC

    Abstract: A memory cell for a display is disclosed. The memory cell has a current limiter on the power supply to reduce the power consumed by the memory cell during a write operation when the binary state of the memory cell is flipped. In a dense memory environment, in a display with a million or more memory cells, the incremental power reduction of each memory cell corresponds to a substantial reduction in the overall power consumed by the display.

    Fusion of acoustic and text representations in RNN-T

    公开(公告)号:US12211509B2

    公开(公告)日:2025-01-28

    申请号:US17821160

    申请日:2022-08-19

    Applicant: Google LLC

    Abstract: A speech recognition model includes an encoder network, a prediction network, and a joint network. The encoder network is configured to receive a sequence of acoustic frames characterizing an input utterance; and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The prediction network is configured to: receive a sequence of non-blank symbols output by a final Softmax layer; and generate, at each of the plurality of output steps, a dense representation. The joint network is configured to generate, at each of the plurality of output steps based on the higher order feature representation and the dense representation, a probability distribution over possible speech recognition hypotheses. The joint network includes a stack of gating and bilinear pooling to fuse the dense representation and the higher order feature representation.

    BACKPLANE FOR AN ARRAY OF EMISSIVE ELEMENTS
    46.
    发明公开

    公开(公告)号:US20240221627A1

    公开(公告)日:2024-07-04

    申请号:US18544051

    申请日:2023-12-18

    Applicant: GOOGLE LLC

    CPC classification number: G09G3/32 G11C11/412 G09G2300/0842 G09G2310/0297

    Abstract: A backplane operative to drive an array of emissive pixel elements is disclosed. A plurality of pixel drive circuits form part of an array of emissive elements. The plurality of pixel drive circuits are disposed to form a plurality of rows and a plurality of columns. The plurality of pixel drive circuits are organized into sets of pixel drive circuits, and each set comprises at least one pixel drive circuit.

    TWO-PASS END TO END SPEECH RECOGNITION

    公开(公告)号:US20220238101A1

    公开(公告)日:2022-07-28

    申请号:US17616135

    申请日:2020-12-03

    Applicant: GOOGLE LLC

    Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

Patent Agency Ranking