-
公开(公告)号:US12260875B2
公开(公告)日:2025-03-25
申请号:US18609362
申请日:2024-03-19
Applicant: Google LLC
Inventor: Ehsan Amid , Om Dipakbhai Thakkar , Rajiv Mathews , Francoise Beaufays
IPC: G10L21/0332 , G10L15/06 , G10L15/08 , G10L21/10
Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.
-
公开(公告)号:US12101106B2
公开(公告)日:2024-09-24
申请号:US18496120
申请日:2023-10-27
Applicant: Google LLC
Inventor: Giovanni Motta , Francoise Beaufays , Petr Zadrazil
Abstract: Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).
-
公开(公告)号:US20240304185A1
公开(公告)日:2024-09-12
申请号:US18598885
申请日:2024-03-07
Applicant: Google LLC
Inventor: Ke Hu , Bo Li , Tara N. Sainath , Yu Zhang , Francoise Beaufays
IPC: G10L15/197 , G10L15/02 , G10L15/06
CPC classification number: G10L15/197 , G10L15/02 , G10L15/063
Abstract: A method of a multilingual ASR model includes receiving a sequence of acoustic frames characterizing an utterance of speech. At a plurality of output steps, the method further includes generating a first higher order feature representation for an acoustic frame by a first encoder that includes a first plurality of multi-head attention layers; generating a second higher order feature representation for a corresponding first higher order feature representation by a second encoder that includes a second plurality of multi-head attention layers; and generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on the second higher order feature representation and a sequence of N previous non-blank symbols. A gating layer of each respective MoE layer configured to dynamically route an output from a previous multi-head attention layer at each of the plurality of output steps to a respective pair of feed-forward expert networks.
-
公开(公告)号:US11842045B2
公开(公告)日:2023-12-12
申请号:US17823545
申请日:2022-08-31
Applicant: Google LLC
Inventor: Yu Ouyang , Diego Melendo Casado , Mohammadinamul Hasan Sheik , Francoise Beaufays , Dragan Zivkovic , Meltem Oktem
IPC: G06F3/04886 , G06F3/16 , G06F1/16 , G06F3/023 , G06F3/04883 , G06F40/166 , G06F40/289 , G10L15/22
CPC classification number: G06F3/04886 , G06F1/1626 , G06F3/0233 , G06F3/04883 , G06F3/167 , G06F40/166 , G06F40/289 , G06F2203/0381 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.
-
公开(公告)号:US11573698B2
公开(公告)日:2023-02-07
申请号:US17469622
申请日:2021-09-08
Applicant: Google LLC
Inventor: Shumin Zhai , Thomas Breuel , Ouais Alsharif , Yu Ouyang , Francoise Beaufays , Johan Schalkwyk
IPC: G06F3/02 , G06F3/04886 , G06N3/04 , G06F40/232 , G06F40/274 , G06F40/279 , G06F3/023 , G06F3/04895 , G06F3/0482 , G06F3/04883 , G06N3/08
Abstract: In some examples, a computing device includes at least one processor; and at least one module, operable by the at least one processor to: output, for display at an output device, a graphical keyboard; receive an indication of a gesture detected at a location of a presence-sensitive input device, wherein the location of the presence-sensitive input device corresponds to a location of the output device that outputs the graphical keyboard; determine, based on at least one spatial feature of the gesture that is processed by the computing device using a neural network, at least one character string, wherein the at least one spatial feature indicates at least one physical property of the gesture; and output, for display at the output device, based at least in part on the processing of the at least one spatial feature of the gesture using the neural network, the at least one character string.
-
公开(公告)号:US20200371686A1
公开(公告)日:2020-11-26
申请号:US16989420
申请日:2020-08-10
Applicant: Google LLC
Inventor: Ouais Alsharif , Peter Ciccotto , Francoise Beaufays , Dragan Zivkovic
IPC: G06F3/0488 , G06F3/023 , G06F40/263 , G06F40/274
Abstract: A keyboard is described that determines, using a first decoder and based on a selection of keys of a graphical keyboard, text. Responsive to determining that a characteristic of the text satisfies a threshold, a model of the keyboard identifies the target language of the text, and determines whether the target language is different than a language associated with the first decoder. If the target language of the text is not different than the language associated with the first decoder, the keyboard outputs, for display, an indication of first candidate words determined by the first decoder from the text. If the target language of the text is different: the keyboard enables a second decoder, where a language associated with the second decoder matches the target language of the text, and outputs, for display, an indication of second candidate words determined by the second decoder from the text.
-
公开(公告)号:US10831366B2
公开(公告)日:2020-11-10
申请号:US15393676
申请日:2016-12-29
Applicant: Google LLC
Inventor: Yu Ouyang , Diego Melendo Casado , Mohammadinamul Hasan Sheik , Francoise Beaufays , Dragan Zivkovic , Meltem Oktem
IPC: G06F3/0488 , G06F3/16 , G06F1/16 , G06F3/023 , G06F40/166 , G06F40/289 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.
-
公开(公告)号:US20200302931A1
公开(公告)日:2020-09-24
申请号:US16896192
申请日:2020-06-08
Applicant: GOOGLE LLC
Inventor: Brian Strope , Francoise Beaufays , William J. Byrne
IPC: G10L15/22 , G10L15/26 , G06Q30/02 , G06F16/29 , G06F16/951 , G06F16/9535 , G06F16/9537 , G10L15/18 , G10L15/197 , G10L15/30
Abstract: A method of providing navigation directions includes receiving, at a user terminal, a query spoken by a user, wherein the query spoken by the user includes a speech utterance indicating (i) a category of business, (ii) a name of the business, and (iii) a location at which or near which the business is disposed; identifying, by processing hardware, the business based on the speech utterance; and providing navigation directions to the business via the user terminal.
-
公开(公告)号:US10671281B2
公开(公告)日:2020-06-02
申请号:US16261640
申请日:2019-01-30
Applicant: Google LLC
Inventor: Shumin Zhai , Thomas Breuel , Ouais Alsharif , Yu Ouyang , Francoise Beaufays , Johan Schalkwyk
IPC: G06F3/02 , G06F3/0488 , G06N3/04 , G06F40/232 , G06F40/274 , G06F40/279 , G06F3/023 , G06F3/0489 , G06F3/0482 , G06N3/08
Abstract: In some examples, a computing device includes at least one processor; and at least one module, operable by the at least one processor to: output, for display at an output device, a graphical keyboard; receive an indication of a gesture detected at a location of a presence-sensitive input device, wherein the location of the presence-sensitive input device corresponds to a location of the output device that outputs the graphical keyboard; determine, based on at least one spatial feature of the gesture that is processed by the computing device using a neural network, at least one character string, wherein the at least one spatial feature indicates at least one physical property of the gesture; and output, for display at the output device, based at least in part on the processing of the at least one spatial feature of the gesture using the neural network, the at least one character string.
-
公开(公告)号:US20180330735A1
公开(公告)日:2018-11-15
申请号:US16041434
申请日:2018-07-20
Applicant: Google LLC
Inventor: Brian Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
-
-
-
-
-
-
-
-
-