-
公开(公告)号:US10026402B2
公开(公告)日:2018-07-17
申请号:US15284323
申请日:2016-10-03
Applicant: GOOGLE LLC
Inventor: Brian Strope , William J. Byrne , Francoise Beaufays
Abstract: A method of searching a business listing with voice commands includes receiving, over the Internet, from a user terminal, a query spoken by a user, which includes a speech utterance representing a category of merchandize, a speech utterance representing a merchandize item, and a speech utterance representing a geographic location. The method includes recognizing the geographic location with a speech recognition engine based on the speech utterance representing the geographic location, recognizing the category of merchandize with the speech recognition engine based on the speech utterance representing the category of merchandize, recognizing the merchandize item with a speech recognition engine based on the speech utterance representing the merchandize item, searching a business listing for businesses within or near the recognized geographic location to select businesses responsive to the query spoken by the user, and sending to the user terminal information related to at least some of the responsive businesses.
-
公开(公告)号:US20250078805A1
公开(公告)日:2025-03-06
申请号:US18823661
申请日:2024-09-03
Applicant: Google LLC
Inventor: Andrew M Rosenberg , Takaaki Saeki , Francoise Beaufays , Bhuvana Ramabhadran
Abstract: A method includes receiving training data that includes a plurality of sets of training utterances each associated with a respective language. Each training utterance includes a corresponding reference speech representation paired with a corresponding input text sequence. For each training utterance, the method includes generating a corresponding encoded textual representation for the corresponding input text sequence, generating a corresponding speech encoding for the corresponding reference speech representation, generating a shared encoder output, and determining a text-to-speech (TTS) loss based on the corresponding encoded textual representation, the corresponding speech encoding, and the shared encoder output. The method also includes training a TTS model based on the TTS losses determined for the training utterances in each set of the training utterances to teach the TTS model to learn how to synthesize speech in each of the respective languages.
-
公开(公告)号:US12169522B2
公开(公告)日:2024-12-17
申请号:US18177747
申请日:2023-03-02
Applicant: Google LLC
Inventor: Johan Schalkwyk , Francoise Beaufays
IPC: G06F16/783 , G06F16/738 , G06F40/169 , G06F40/30
Abstract: A method includes receiving a content feed that includes audio data corresponding to speech utterances and processing the content feed to generate a semantically-rich, structured document. The structured document includes a transcription of the speech utterances and includes a plurality of words each aligned with a corresponding audio segment of the audio data that indicates a time when the word was recognized in the audio data. During playback of the content feed, the method also includes receiving a query from a user requesting information contained in the content feed and processing, by a large language model, the query and the structured document to generate a response to the query. The response conveys the requested information contained in the content feed. The method also includes providing, for output from a user device associated with the user, the response to the query.
-
公开(公告)号:US11955134B2
公开(公告)日:2024-04-09
申请号:US17643848
申请日:2021-12-13
Applicant: Google LLC
Inventor: Ehsan Amid , Om Thakkar , Rajiv Mathews , Francoise Beaufays
IPC: G10L21/0332 , G10L15/06 , G10L15/08 , G10L21/10
CPC classification number: G10L21/0332 , G10L15/063 , G10L15/08 , G10L21/10
Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.
-
公开(公告)号:US20240080038A1
公开(公告)日:2024-03-07
申请号:US18496120
申请日:2023-10-27
Applicant: Google LLC
Inventor: Giovanni Motta , Francoise Beaufays , Petr Zadrazil
Abstract: Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).
-
公开(公告)号:US20220413696A1
公开(公告)日:2022-12-29
申请号:US17823545
申请日:2022-08-31
Applicant: Google LLC
Inventor: Yu Ouyang , Diego Melendo Casado , Mohammadinamul Hasan Sheik , Francoise Beaufays , Dragan Zivkovic , Meltem Oktem
IPC: G06F3/04886 , G06F3/16 , G06F1/16 , G06F3/023 , G06F3/04883 , G06F40/166 , G06F40/289
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.
-
公开(公告)号:US11527248B2
公开(公告)日:2022-12-13
申请号:US16885116
申请日:2020-05-27
Applicant: Google LLC
Inventor: Brian Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
-
公开(公告)号:US20220229548A1
公开(公告)日:2022-07-21
申请号:US17658233
申请日:2022-04-06
Applicant: Google LLC
Inventor: Ouais Alsharif , Peter Ciccotto , Francoise Beaufays , Dragan Zivkovic
IPC: G06F3/04886 , G06F3/023 , G06F40/263 , G06F40/274
Abstract: A keyboard is described that determines, using a first decoder and based on a selection of keys of a graphical keyboard, text. Responsive to determining that a characteristic of the text satisfies a threshold, a model of the keyboard identifies the target language of the text, and determines whether the target language is different than a language associated with the first decoder. If the target language of the text is not different than the language associated with the first decoder, the keyboard outputs, for display, an indication of first candidate words determined by the first decoder from the text. If the target language of the text is different: the keyboard enables a second decoder, where a language associated with the second decoder matches the target language of the text, and outputs, for display, an indication of second candidate words determined by the second decoder from the text.
-
公开(公告)号:US10679624B2
公开(公告)日:2020-06-09
申请号:US16036662
申请日:2018-07-16
Applicant: GOOGLE LLC
Inventor: Brian Strope , Francoise Beaufays , William J. Byrne
IPC: G06F16/9535 , G10L15/22 , G10L15/26 , G06Q30/02 , G06F16/29 , G06F16/951 , G06F16/9537 , G10L15/18 , G10L15/197 , G10L15/30
Abstract: A method of providing a personal directory service includes receiving, over the Internet, from a user terminal, a query spoken by a user, where the query spoken by the user includes a speech utterance representing a category of persons. The method also includes determining a geographic location of the user terminal, recognizing the category of persons with the speech recognition engine based on the speech utterance representing the category of persons a listing of persons within or near the determined geographic location matching the query to select persons responsive to the query spoken by the user, and sending to the user terminal information related to at least some of the responsive persons.
-
公开(公告)号:US10152965B2
公开(公告)日:2018-12-11
申请号:US15014213
申请日:2016-02-03
Applicant: Google LLC
Inventor: Antoine Jean Bruguier , Fuchun Peng , Francoise Beaufays
IPC: G10L15/00 , G10L15/06 , G10L15/065 , G10L15/26
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
-
-
-
-
-
-
-
-
-