Patent search ap:("GOOGLE LLC") AND inv:"Bo Li" Page 3

21.

发明授权
Unified endpointer using multitask and multidomain learning 有权

公开(公告)号：US10929754B2

公开(公告)日：2021-02-23

申请号：US16711172

申请日：2019-12-11

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Bo Li , Gabor Simko , Maria Carolina Parada San Martin , Sean Matthew Shannon

IPC: G10L15/16 , G06N3/08 , G06N3/04 , G06N20/20 , G06K9/62 , G06N5/04

Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

22.

发明授权
Joint unsupervised and supervised training for multilingual ASR 有权

公开(公告)号：US12249317B2

公开(公告)日：2025-03-11

申请号：US17929934

申请日：2022-09-06

Applicant: Google LLC

Inventor： Bo Li , Junwen Bai , Yu Zhang , Ankur Bapna , Nikhil Siddhartha , Khe Chai Sim , Tara N. Sainath

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/187 , G10L15/19

Abstract: A method includes receiving audio features and generating a latent speech representation based on the audio features. The method also includes generating a target quantized vector token and a target token index for a corresponding latent speech representation. The method also includes generating a contrastive context vector for a corresponding unmasked or masked latent speech representation and deriving a contrastive self-supervised loss based on the corresponding contrastive context vector and the corresponding target quantized vector token. The method also include generating a high-level context vector based on the contrastive context vector and, for each high-level context vector, learning to predict the target token index at the corresponding time step using a cross-entropy loss based on the target token index. The method also includes predicting speech recognition hypotheses for the utterance and training a multilingual automatic speech recognition (ASR) model using an unsupervised loss and a supervised loss.

23.

发明申请
QUANTIZATION AND SPARSITY AWARE FINE-TUNING FOR SPEECH RECOGNITION WITH UNIVERSAL SPEECH MODELS 有权

公开(公告)号：US20250078815A1

公开(公告)日：2025-03-06

申请号：US18826135

申请日：2024-09-05

Applicant: Google LLC

Inventor： Shaojin Ding , David Qiu , David Rim , Amir Yazdanbakhsh , Yanzhang He , Zhonglin Han , Rohit Prakash Prabhavalkar , Weiran Wang , Bo Li , Jian Li , Tara N. Sainath , Shivani Agrawal , Oleg Rybakov

IPC: G10L15/06

Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.

24.

发明授权
Larger backplane suitable for high speed applications 有权

公开(公告)号：US12236917B2

公开(公告)日：2025-02-25

申请号：US18067267

申请日：2022-12-16

Applicant: GOOGLE LLC

Inventor： Bo Li , Kaushik Sheth

IPC: G09G3/36

Abstract: A display system comprising a plurality of display controller circuits controlling a like number of independent segments of pixel drive circuits of a backplane. Each pixel drive circuit comprises a memory element and associated pixel drive circuitry. The segments of the backplane may be organized vertically. The word line for the memory cells of a first segment of pixel drive circuits passes underneath a second segment of pixel drive circuits without directly interacting with the pixel drive circuits of the second segment in order to reach the pixel drive circuits of the first segment. The plurality of display controller circuits operate asynchronously but are kept at the same frame rate by an external signal such as Vsync.

25.

发明授权
Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice 有权

公开(公告)号：US12094453B2

公开(公告)日：2024-09-17

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/16 , G10L15/187 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L15/187

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

26.

发明公开
MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR 审中-公开

公开(公告)号：US20240304185A1

公开(公告)日：2024-09-12

申请号：US18598885

申请日：2024-03-07

Applicant: Google LLC

Inventor： Ke Hu , Bo Li , Tara N. Sainath , Yu Zhang , Francoise Beaufays

IPC: G10L15/197 , G10L15/02 , G10L15/06

CPC classification number: G10L15/197 , G10L15/02 , G10L15/063

Abstract: A method of a multilingual ASR model includes receiving a sequence of acoustic frames characterizing an utterance of speech. At a plurality of output steps, the method further includes generating a first higher order feature representation for an acoustic frame by a first encoder that includes a first plurality of multi-head attention layers; generating a second higher order feature representation for a corresponding first higher order feature representation by a second encoder that includes a second plurality of multi-head attention layers; and generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on the second higher order feature representation and a sequence of N previous non-blank symbols. A gating layer of each respective MoE layer configured to dynamically route an output from a previous multi-head attention layer at each of the plurality of output steps to a respective pair of feed-forward expert networks.

27.

发明公开
Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification 审中-公开

公开(公告)号：US20230306958A1

公开(公告)日：2023-09-28

申请号：US18188632

申请日：2023-03-23

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Sepand Mavandadi , Shuo-yiin Chang , Parisa Haghani

IPC: G10L15/00 , G10L15/16 , G10L15/06

CPC classification number: G10L15/005 , G10L15/16 , G10L15/063

Abstract: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

28.

发明公开
Systems and Methods for Training Dual-Mode Machine-Learned Speech Recognition Models 审中-公开

公开(公告)号：US20230237993A1

公开(公告)日：2023-07-27

申请号：US18011571

申请日：2021-10-01

Applicant: Google LLC

Inventor： Jiahui Yu , Ruoming Pang , Wei Han , Anmol Gulati , Chung-Cheng Chiu , Bo Li , Tara N. Sainath , Yonghui Hu

IPC: G10L15/16 , G10L15/32 , G10L15/22

CPC classification number: G10L15/16 , G10L15/32 , G10L15/22

Abstract: Systems and methods of the present disclosure are directed to a computing system, including one or more processors and a machine-learned multi-mode speech recognition model configured to operate in a streaming recognition mode or a contextual recognition mode. The computing system can perform operations including obtaining speech data and a ground truth label and processing the speech data using the contextual recognition mode to obtain contextual prediction data. The operations can include evaluating a difference between the contextual prediction data and the ground truth label and processing the speech data using the streaming recognition mode to obtain streaming prediction data. The operations can include evaluating a difference between the streaming prediction data and the ground truth label and the contextual and streaming prediction data. The operations can include adjusting parameters of the speech recognition model.

29.

发明公开
EFFICIENT IMAGE DATA DELIVERY FOR AN ARRAY OF PIXEL MEMORY CELLS 审中-公开

公开(公告)号：US20230147106A1

公开(公告)日：2023-05-11

申请号：US18150724

申请日：2023-01-05

Applicant: GOOGLE LLC

Inventor： Bo Li , Kaushik Sheth , Edwin Lyle Hudson

IPC: G09G3/36

CPC classification number: G09G3/3688 , G09G2360/12

Abstract: A backplane design for delivering image data in an efficient manner to a memory cell forming a part of a pixel driver comprises a word line design and a column data register release signal delivery design that are speed matched and a complementary bit line delivery design that is speed matched to a row decoder signal circuit operative to pull a word line driver to a state to enable the memory circuits of that row to receive data from the column drivers for each column. The speed matching is effective over a range of operating temperatures because the circuit designs are substantially identical.

30.

发明授权
Automatic speech recognition using multi-dimensional models 有权

公开(公告)号：US09984683B2

公开(公告)日：2018-05-29

申请号：US15217457

申请日：2016-07-22

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath

IPC: G10L15/16 , G10L15/02 , G06N3/08 , G10L15/26

CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/26 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for automatic speech recognition using multi-dimensional models. In some implementations, audio data that describes an utterance is received. A transcription for the utterance is determined using an acoustic model that includes a neural network having first memory blocks for time information and second memory blocks for frequency information. The transcription for the utterance is provided as output of an automated speech recognizer.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification