Business or personal listing search

    公开(公告)号:US10026402B2

    公开(公告)日:2018-07-17

    申请号:US15284323

    申请日:2016-10-03

    Applicant: GOOGLE LLC

    Abstract: A method of searching a business listing with voice commands includes receiving, over the Internet, from a user terminal, a query spoken by a user, which includes a speech utterance representing a category of merchandize, a speech utterance representing a merchandize item, and a speech utterance representing a geographic location. The method includes recognizing the geographic location with a speech recognition engine based on the speech utterance representing the geographic location, recognizing the category of merchandize with the speech recognition engine based on the speech utterance representing the category of merchandize, recognizing the merchandize item with a speech recognition engine based on the speech utterance representing the merchandize item, searching a business listing for businesses within or near the recognized geographic location to select businesses responsive to the query spoken by the user, and sending to the user terminal information related to at least some of the responsive businesses.

    Scaling Multilingual Speech Synthesis with Zero Supervision of Found Data

    公开(公告)号:US20250078805A1

    公开(公告)日:2025-03-06

    申请号:US18823661

    申请日:2024-09-03

    Applicant: Google LLC

    Abstract: A method includes receiving training data that includes a plurality of sets of training utterances each associated with a respective language. Each training utterance includes a corresponding reference speech representation paired with a corresponding input text sequence. For each training utterance, the method includes generating a corresponding encoded textual representation for the corresponding input text sequence, generating a corresponding speech encoding for the corresponding reference speech representation, generating a shared encoder output, and determining a text-to-speech (TTS) loss based on the corresponding encoded textual representation, the corresponding speech encoding, and the shared encoder output. The method also includes training a TTS model based on the TTS losses determined for the training utterances in each set of the training utterances to teach the TTS model to learn how to synthesize speech in each of the respective languages.

    Structured video documents
    23.
    发明授权

    公开(公告)号:US12169522B2

    公开(公告)日:2024-12-17

    申请号:US18177747

    申请日:2023-03-02

    Applicant: Google LLC

    Abstract: A method includes receiving a content feed that includes audio data corresponding to speech utterances and processing the content feed to generate a semantically-rich, structured document. The structured document includes a transcription of the speech utterances and includes a plurality of words each aligned with a corresponding audio segment of the audio data that indicates a time when the word was recognized in the audio data. During playback of the content feed, the method also includes receiving a query from a user requesting information contained in the content feed and processing, by a large language model, the query and the structured document to generate a response to the query. The response conveys the requested information contained in the content feed. The method also includes providing, for output from a user device associated with the user, the response to the query.

    Phrase extraction for ASR models
    24.
    发明授权

    公开(公告)号:US11955134B2

    公开(公告)日:2024-04-09

    申请号:US17643848

    申请日:2021-12-13

    Applicant: Google LLC

    CPC classification number: G10L21/0332 G10L15/063 G10L15/08 G10L21/10

    Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

    Modality Learning On Mobile Devices

    公开(公告)号:US20220413696A1

    公开(公告)日:2022-12-29

    申请号:US17823545

    申请日:2022-08-31

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.

    Speech recognition with parallel recognition tasks

    公开(公告)号:US11527248B2

    公开(公告)日:2022-12-13

    申请号:US16885116

    申请日:2020-05-27

    Applicant: Google LLC

    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

    Keyboard Automatic Language Identification and Reconfiguration

    公开(公告)号:US20220229548A1

    公开(公告)日:2022-07-21

    申请号:US17658233

    申请日:2022-04-06

    Applicant: Google LLC

    Abstract: A keyboard is described that determines, using a first decoder and based on a selection of keys of a graphical keyboard, text. Responsive to determining that a characteristic of the text satisfies a threshold, a model of the keyboard identifies the target language of the text, and determines whether the target language is different than a language associated with the first decoder. If the target language of the text is not different than the language associated with the first decoder, the keyboard outputs, for display, an indication of first candidate words determined by the first decoder from the text. If the target language of the text is different: the keyboard enables a second decoder, where a language associated with the second decoder matches the target language of the text, and outputs, for display, an indication of second candidate words determined by the second decoder from the text.

    Personal directory service
    29.
    发明授权

    公开(公告)号:US10679624B2

    公开(公告)日:2020-06-09

    申请号:US16036662

    申请日:2018-07-16

    Applicant: GOOGLE LLC

    Abstract: A method of providing a personal directory service includes receiving, over the Internet, from a user terminal, a query spoken by a user, where the query spoken by the user includes a speech utterance representing a category of persons. The method also includes determining a geographic location of the user terminal, recognizing the category of persons with the speech recognition engine based on the speech utterance representing the category of persons a listing of persons within or near the determined geographic location matching the query to select persons responsive to the query spoken by the user, and sending to the user terminal information related to at least some of the responsive persons.

    Learning personalized entity pronunciations

    公开(公告)号:US10152965B2

    公开(公告)日:2018-12-11

    申请号:US15014213

    申请日:2016-02-03

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.

Patent Agency Ranking