-
公开(公告)号:US11948570B2
公开(公告)日:2024-04-02
申请号:US17654195
申请日:2022-03-09
Applicant: Google LLC
Inventor: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. Mcgraw , Anton Bakhtin
CPC classification number: G10L15/22 , G10L15/02 , G10L15/063 , G10L15/18 , G10L19/00 , G10L2015/025 , G10L2015/088 , G10L15/142 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
公开(公告)号:US20210072378A1
公开(公告)日:2021-03-11
申请号:US16951800
申请日:2020-11-18
Applicant: GOOGLE LLC
Inventor: Dongeek Shin , Rajeev Nongpiur , Wei Li , Jian Guo , Jennifer Yeelam Wong , Andrew Christopher Felch , James Paul Tobin , Lu Gao , Brian Silverstein
Abstract: An electronic device has memory, one or more processors, a speaker, and a microphone. The device sends a first set of ultrasound chirps at a first rate via the speaker. It receives, via the microphone, a first set of signals corresponding to the first set of ultrasound chirps and being reflected from a person. The device determines based on the first set of signals that the person is in proximity to the electronic device. In accordance with the determination that the person is in proximity to the electronic device, the device sends a second set of ultrasound chirps at a second rate, faster than the first rate. It receives, via the microphone, a second set of signals corresponding to the second set of ultrasound chirps, and identifies a gesture from the person based on the second set of signals.
-
公开(公告)号:US12255292B2
公开(公告)日:2025-03-18
申请号:US18606897
申请日:2024-03-15
Applicant: Google LLC
Inventor: James Robert Lim , Wei Li , Brian Conner , Brett Wilson
Abstract: An example outdoor mounted device includes a first battery configured to operate at a low temperature range that at least includes negative 20 Celsius; a second battery configured to operate at a high temperature range; a temperature sensor; and processing circuitry configured to: determine, based on data received from the temperature sensors, a current temperature; responsive to determining that the current temperature is within the low temperature range, cause one or more components of the computing device to operate using electrical energy sourced from the first battery; and responsive to determining that the current temperature is within the high temperature range, cause the one or more components of the computing device to operate using electrical energy sourced from the second battery.
-
公开(公告)号:US20240420687A1
公开(公告)日:2024-12-19
申请号:US18815537
申请日:2024-08-26
Applicant: GOOGLE LLC
Inventor: Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li
Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
-
公开(公告)号:US20240222714A1
公开(公告)日:2024-07-04
申请号:US18606897
申请日:2024-03-15
Applicant: Google LLC
Inventor: James Robert Lim , Wei Li , Brian Conner , Brett Wilson
CPC classification number: H01M10/425 , H01M10/482 , H01M10/486 , H01M2010/4271
Abstract: An example outdoor mounted device includes a first battery configured to operate at a low temperature range that at least includes negative 20 Celsius; a second battery configured to operate at a high temperature range; a temperature sensor; and processing circuitry configured to: determine, based on data received from the temperature sensors, a current temperature; responsive to determining that the current temperature is within the low temperature range, cause one or more components of the computing device to operate using electrical energy sourced from the first battery; and responsive to determining that the current temperature is within the high temperature range, cause the one or more components of the computing device to operate using electrical energy sourced from the second battery.
-
公开(公告)号:US20240029413A1
公开(公告)日:2024-01-25
申请号:US18350845
申请日:2023-07-12
Applicant: Google LLC
Inventor: Anthony Jacob Piergiovanni , Weiching Kuo , Wei Li , Anelia Angelova
IPC: G06V10/774 , G06V10/25
CPC classification number: G06V10/774 , G06V10/25 , G06V2201/07
Abstract: A method involves the training of a model by dynamically adjusting the number of examples within each training batch. The dynamic adjustment is accomplished by adjusting the number of examples per task within each training batch according to the performance of the model on the tasks that the model is being trained on. In some embodiments, this method is applied to cross-modal vision-language tasks. This model may also be applied to the pre-training of a model that can be later fine-tuned for a more specific task(s).
-
公开(公告)号:US11610586B2
公开(公告)日:2023-03-21
申请号:US17182592
申请日:2021-02-23
Applicant: Google LLC
Inventor: David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw
Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.
-
公开(公告)号:US20220310072A1
公开(公告)日:2022-09-29
申请号:US17616129
申请日:2020-06-03
Applicant: GOOGLE LLC
Inventor: Tara N. Sainath , Ruoming Pang , David Rybach , Yanzhang He , Rohit Prabhavalkar , Wei Li , Mirkó Visontai , Qiao Liang , Trevor Strohman , Yonghui Wu , Ian C. McGraw , Chung-Cheng Chiu
Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
-
公开(公告)号:US20220199084A1
公开(公告)日:2022-06-23
申请号:US17654195
申请日:2022-03-09
Applicant: Google LLC
Inventor: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. McGraw , Anton Bakhtin
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers, generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
-
-
-
-
-
-
-