Patent search ap:("GOOGLE LLC") AND inv:"Bo Li" Page 4

31.

发明申请
Language Agnostic Multilingual End-To-End Streaming On-Device ASR System 有权

公开(公告)号：US20250095634A1

公开(公告)日：2025-03-20

申请号：US18965193

申请日：2024-12-02

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath , Ruoming Pang , Shuo-yiin Chang , Qiumin Xu , Trevor Strohman , Vince Chen , Qiao Liang , Heguang Liu , Yanzhang He , Parisa Haghani , Sameer Bidichandani

IPC: G10L15/00 , G10L15/06 , G10L15/22 , G10L15/30

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

32.

发明授权
Multi-dialect and multilingual speech recognition 有权

公开(公告)号：US12254865B2

公开(公告)日：2025-03-18

申请号：US18418246

申请日：2024-01-20

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/06 , G10L15/07 , G10L15/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

33.

发明授权
Two-pass end to end speech recognition 有权

公开(公告)号：US12073824B2

公开(公告)日：2024-08-27

申请号：US17616135

申请日：2020-12-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-Yiin Chang , Wei Li

IPC: G10L15/00 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/16 , G10L15/22

CPC classification number: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/063 , G10L15/22 , G10L2015/0635

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

34.

发明公开
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240161732A1

公开(公告)日：2024-05-16

申请号：US18418246

申请日：2024-01-20

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/07 , G10L15/16

CPC classification number: G10L15/005 , G10L15/07 , G10L15/16 , G10L2015/0631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

35.

发明公开
Unified End-To-End Speech Recognition And Endpointing Using A Switch Connection 审中-公开

公开(公告)号：US20240029719A1

公开(公告)日：2024-01-25

申请号：US18340093

申请日：2023-06-23

Applicant: Google LLC

Inventor： Shaan Jagdeep Patrick Bijwadia , Shuo-yiin Chang , Bo Li , Yanzhang He , Tara N. Sainath , Chao Zhang

IPC: G10L15/16 , G10L15/06 , G10L25/93

CPC classification number: G10L15/16 , G10L15/063 , G10L25/93

Abstract: A single E2E multitask model includes a speech recognition model and an endpointer model. The speech recognition model includes an audio encoder configured to encode a sequence of audio frames into corresponding higher-order feature representations, and a decoder configured to generate probability distributions over possible speech recognition hypotheses for the sequence of audio frames based on the higher-order feature representations. The endpointer model is configured to operate between a VAD mode and an EOQ detection mode. During the VAD mode, the endpointer model receives input audio frames, and determines, for each input audio frame, whether the input audio frame includes speech. During the EOQ detection mode, the endpointer model receives latent representations for the sequence of audio frames output from the audio encoder, and determines, for each of the latent representation, whether the latent representation includes final silence.

36.

发明授权
Unified endpointer using multitask and multidomain learning 有权

公开(公告)号：US11676625B2

公开(公告)日：2023-06-13

申请号：US17152918

申请日：2021-01-20

Applicant: Google LLC

Inventor： Shuo-Yiin Chang , Bo Li , Gabor Simko , Maria Carolina Parada San Martin , Sean Matthew Shannon

IPC: G10L15/16 , G10L25/78 , G06N3/08 , G06N20/20 , G06N5/046 , G06F18/214 , G06N3/045

CPC classification number: G10L25/78 , G06F18/214 , G06N3/045 , G06N3/08 , G06N5/046 , G06N20/20 , G10L15/16

Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

37.

发明授权
Backplane adaptable to drive emissive pixel arrays of differing pitches 有权

公开(公告)号：US11568802B2

公开(公告)日：2023-01-31

申请号：US17584668

申请日：2022-01-26

Applicant: Google LLC

Inventor： Bo Li , Kaushik Sheth

IPC: G09G3/32

Abstract: A backplane suitable to pulse width modulate an array of emissive pixels with a current that is substantially constant over a wide range of temperatures. A current control circuit provides means to provide a constant current to an array of current mirror pixel drive elements. The current control circuit comprises a thermally stable bias resistor and a thermally stable band-gap voltage source to provide thermally stable controls and a large L p-channel reference current FET with an associated large L n-channel bias FET configured to provide a reference current at a required voltage to the gate of a large L p-channel current source FET. The current control circuit and the current mirror pixel drive elements are similar circuits with one current control circuit able to control a substantial number of pixel drive elements.

38.

发明申请
Learning Word-Level Confidence for Subword End-To-End Automatic Speech Recognition 有权

公开(公告)号：US20220270597A1

公开(公告)日：2022-08-25

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G10L25/30 , G06N3/08

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

39.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 有权

公开(公告)号：US20220148582A1

公开(公告)日：2022-05-12

申请号：US17649058

申请日：2022-01-26

Applicant: Google LLC

Inventor： Bo Li , Ron Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/16 , G10L15/20 , G10L21/0224

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

40.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20220005465A1

公开(公告)日：2022-01-06

申请号：US17448119

申请日：2021-09-20

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G10L15/22 , G10L15/02 , G06N3/08 , G10L15/06 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification