Scalable dynamic class language modeling

    公开(公告)号:US11804218B2

    公开(公告)日:2023-10-31

    申请号:US17172600

    申请日:2021-02-10

    申请人: Google LLC

    摘要: This document generally describes systems and methods for dynamically adapting speech recognition for individual voice queries of a user using class-based language models. The method may include receiving a voice query from a user that includes audio data corresponding to an utterance of the user, and context data associated with the user. One or more class models are then generated that collectively identify a first set of terms determined based on the context data, and a respective class to which the respective term is assigned for each respective term in the first set of terms. A language model that includes a residual unigram may then be accessed and processed for each respective class to insert a respective class symbol at each instance of the residual unigram that occurs within the language model. A transcription of the utterance of the user is then generated using the modified language model.

    Contextual denormalization for automatic speech recognition

    公开(公告)号:US11676607B2

    公开(公告)日:2023-06-13

    申请号:US17652923

    申请日:2022-02-28

    申请人: Google LLC

    IPC分类号: G10L15/26 G06F40/56 G10L15/22

    摘要: A method for denormalizing raw speech recognition results. The method includes receiving a speech input from a user and obtaining context metadata associated with the speech input. The context metadata indicates that the speech input includes dictated speech directed to a messaging application that is currently executing on a user device for inclusion in an electronic message. The method further includes generating, using a speech recognizer, a raw speech recognition result including an explicit punctuation term spoken by the user and corresponding to the speech input. Based on the context metadata, the method includes denormalizing the generated raw speech recognition result into denormalized text by applying an explicit punctuation denormalizer to convert the explicit punctuation term in the raw speech recognition result into a corresponding punctuation symbol and displaying the denormalized text including the corresponding punctuation symbol on a display screen of the user device.

    LANGUAGE MODEL BIASING MODULATION

    公开(公告)号:US20230109903A1

    公开(公告)日:2023-04-13

    申请号:US18064917

    申请日:2022-12-12

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

    VOICE RECOGNITION SYSTEM
    4.
    发明申请

    公开(公告)号:US20220343915A1

    公开(公告)日:2022-10-27

    申请号:US17811605

    申请日:2022-07-11

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

    CONTEXTUAL TAGGING AND BIASING OF GRAMMARS INSIDE WORD LATTICES

    公开(公告)号:US20220310082A1

    公开(公告)日:2022-09-29

    申请号:US17807208

    申请日:2022-06-16

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription. The actions further include providing, for output, the selected candidate transcription as a transcription of the utterance.

    VOICE TO TEXT CONVERSION BASED ON THIRD-PARTY AGENT CONTENT

    公开(公告)号:US20220148596A1

    公开(公告)日:2022-05-12

    申请号:US17582926

    申请日:2022-01-24

    申请人: Google LLC

    摘要: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

    SERVER SIDE HOTWORDING
    7.
    发明申请

    公开(公告)号:US20210287678A1

    公开(公告)日:2021-09-16

    申请号:US17337182

    申请日:2021-06-02

    申请人: GOOGLE LLC

    IPC分类号: G10L15/30 G10L15/32 G10L15/26

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

    SPEECH RECOGNITION FOR KEYWORDS
    8.
    发明申请

    公开(公告)号:US20210256567A1

    公开(公告)日:2021-08-19

    申请号:US17308624

    申请日:2021-05-05

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser. The method further includes generating a score for the candidate adword based on a likelihood of a speech recognizer generating, based on an utterance of the candidate adword, a transcription that includes a word that is associated with an expected pronunciation of the candidate adword. The method further includes classifying, based at least on the score, the candidate adword as an appropriate adword for use in a bidding process for advertisements that are selected based on a transcription of a speech query or as not an appropriate adword for use in the bidding process for advertisements that are selected based on the transcription of the speech query.

    Speech recognition for keywords
    9.
    发明授权

    公开(公告)号:US11030658B2

    公开(公告)日:2021-06-08

    申请号:US16047723

    申请日:2018-07-27

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser. The method further includes generating a score for the candidate adword based on a likelihood of a speech recognizer generating, based on an utterance of the candidate adword, a transcription that includes a word that is associated with an expected pronunciation of the candidate adword. The method further includes classifying, based at least on the score, the candidate adword as an appropriate adword for use in a bidding process for advertisements that are selected based on a transcription of a speech query or as not an appropriate adword for use in the bidding process for advertisements that are selected based on the transcription of the speech query.

    Scalable dynamic class language modeling

    公开(公告)号:US10957312B2

    公开(公告)日:2021-03-23

    申请号:US16731753

    申请日:2019-12-31

    申请人: Google LLC

    摘要: This document generally describes systems and methods for dynamically adapting speech recognition for individual voice queries of a user using class-based language models. The method may include receiving a voice query from a user that includes audio data corresponding to an utterance of the user, and context data associated with the user. One or more class models are then generated that collectively identify a first set of terms determined based on the context data, and a respective class to which the respective term is assigned for each respective term in the first set of terms. A language model that includes a residual unigram may then be accessed and processed for each respective class to insert a respective class symbol at each instance of the residual unigram that occurs within the language model. A transcription of the utterance of the user is then generated using the modified language model.