Patent search ap:("GOOGLE LLC") AND inv:"Bo Li" Page 5

41.

发明申请
Unified Endpointer Using Multitask and Multidomain Learning 有权

公开(公告)号：US20210142174A1

公开(公告)日：2021-05-13

申请号：US17152918

申请日：2021-01-20

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Bo Li , Gabor Simko , Maria Corolina Parada San Martin , Sean Matthew Shannon

IPC: G06N3/08 , G06N3/04 , G10L15/16 , G06N20/20 , G06K9/62 , G06N5/04

Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

42.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US10515626B2

公开(公告)日：2019-12-24

申请号：US15848829

申请日：2017-12-20

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L21/0224 , G10L15/20 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

43.

发明申请
MEMORY CELL FOR A PIXEL OF A DISPLAY 有权

公开(公告)号：US20250037644A1

公开(公告)日：2025-01-30

申请号：US18655071

申请日：2024-05-03

Applicant: GOOGLE LLC

Inventor： Bo Li , Kaushik Indravadan Sheth

IPC: G09G3/32 , G11C11/419

Abstract: A memory cell for a display is disclosed. The memory cell has a current limiter on the power supply to reduce the power consumed by the memory cell during a write operation when the binary state of the memory cell is flipped. In a dense memory environment, in a display with a million or more memory cells, the incremental power reduction of each memory cell corresponds to a substantial reduction in the overall power consumed by the display.

44.

发明授权
Fusion of acoustic and text representations in RNN-T 有权

公开(公告)号：US12211509B2

公开(公告)日：2025-01-28

申请号：US17821160

申请日：2022-08-19

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Zhiyun Lu , Tara N. Sainath , Shuo-yiin Chang

IPC: G10L15/30 , G06N7/01

Abstract: A speech recognition model includes an encoder network, a prediction network, and a joint network. The encoder network is configured to receive a sequence of acoustic frames characterizing an input utterance; and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The prediction network is configured to: receive a sequence of non-blank symbols output by a final Softmax layer; and generate, at each of the plurality of output steps, a dense representation. The joint network is configured to generate, at each of the plurality of output steps based on the higher order feature representation and the dense representation, a probability distribution over possible speech recognition hypotheses. The joint network includes a stack of gating and bilinear pooling to fuse the dense representation and the higher order feature representation.

45.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20240420686A1

公开(公告)日：2024-12-19

申请号：US18815200

申请日：2024-08-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

46.

发明公开
BACKPLANE FOR AN ARRAY OF EMISSIVE ELEMENTS 审中-公开

公开(公告)号：US20240221627A1

公开(公告)日：2024-07-04

申请号：US18544051

申请日：2023-12-18

Applicant: GOOGLE LLC

Inventor： Edwin Lyle Hudson , Bo Li

IPC: G09G3/32 , G11C11/412

CPC classification number: G09G3/32 , G11C11/412 , G09G2300/0842 , G09G2310/0297

Abstract: A backplane operative to drive an array of emissive pixel elements is disclosed. A plurality of pixel drive circuits form part of an array of emissive elements. The plurality of pixel drive circuits are disposed to form a plurality of rows and a plurality of columns. The plurality of pixel drive circuits are organized into sets of pixel drive circuits, and each set comprises at least one pixel drive circuit.

47.

发明公开
PARAMETER-EFFICIENT MODEL REPROGRAMMING FOR CROSS-LINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240185841A1

公开(公告)日：2024-06-06

申请号：US18490808

申请日：2023-10-20

Applicant: Google LLC

Inventor： Bo Li , Yu Zhang , Nanxin Chen , Rohit Prakash Prabhavalkar , Chao-Han Huck Yang , Tara N. Sainath , Trevor Strohman

IPC: G10L15/065 , G10L15/00

CPC classification number: G10L15/065 , G10L15/005

Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.

48.

发明授权
Multi-dialect and multilingual speech recognition 有权

公开(公告)号：US11900915B2

公开(公告)日：2024-02-13

申请号：US17572238

申请日：2022-01-10

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/07 , G10L15/06

CPC classification number: G10L15/005 , G10L15/07 , G10L15/16 , G10L2015/0631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

49.

发明授权
Joint endpointing and automatic speech recognition 有权

公开(公告)号：US11475880B2

公开(公告)日：2022-10-18

申请号：US16809403

申请日：2020-03-04

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Rohit Prakash Prabhavalkar , Gabor Simko , Tara N. Sainath , Bo Li , Yangzhang He

IPC: G10L15/16 , G10L15/02 , G10L15/14 , G10L15/28 , G10L15/08

Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.

50.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20220238101A1

公开(公告)日：2022-07-28

申请号：US17616135

申请日：2020-12-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G10L15/05 , G10L15/22 , G10L15/06 , G06N3/08

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification