Textual echo cancellation
    63.
    发明授权

    公开(公告)号:US11482244B2

    公开(公告)日:2022-10-25

    申请号:US17199347

    申请日:2021-03-11

    Applicant: Google LLC

    Inventor: Quan Wang

    Abstract: A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.

    TRAINING AND/OR USING A LANGUAGE SELECTION MODEL FOR AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE

    公开(公告)号:US20220328035A1

    公开(公告)日:2022-10-13

    申请号:US17846287

    申请日:2022-06-22

    Applicant: Google LLC

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

    Hybrid Multilingual Text-Dependent and Text-Independent Speaker Verification

    公开(公告)号:US20220310098A1

    公开(公告)日:2022-09-29

    申请号:US17211791

    申请日:2021-03-24

    Applicant: Google LLC

    Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing a first portion of the audio data that characterizes a predetermined hotword to generate a text-dependent evaluation vector, and generating one or more text-dependent confidence scores. When one of the text-dependent confidence scores satisfies a threshold, the operations include identifying a speaker of the utterance as a respective enrolled user associated with the text-dependent confidence score that satisfies the threshold and initiating performance of an action without performing speaker verification. When none of the text-dependent confidence scores satisfy the threshold, the operations include processing a second portion of the audio data that characterizes a query to generate a text-independent evaluation vector, generating one or more text-independent confidence scores, and determining whether the identity of the speaker of the utterance includes any of the enrolled users.

    Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance

    公开(公告)号:US11410641B2

    公开(公告)日:2022-08-09

    申请号:US16959037

    申请日:2019-11-27

    Applicant: Google LLC

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

    Textual Echo Cancellation
    67.
    发明申请

    公开(公告)号:US20210390975A1

    公开(公告)日:2021-12-16

    申请号:US17199347

    申请日:2021-03-11

    Applicant: Google LLC

    Inventor: Quan Wang

    Abstract: A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.

    SPEAKER VERIFICATION
    68.
    发明申请

    公开(公告)号:US20210256981A1

    公开(公告)日:2021-08-19

    申请号:US17307704

    申请日:2021-05-04

    Applicant: Google LLC

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

    Speaker verification across locations, languages, and/or dialects

    公开(公告)号:US11017784B2

    公开(公告)日:2021-05-25

    申请号:US16557390

    申请日:2019-08-30

    Applicant: Google LLC

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

    SPEAKER VERIFICATION
    70.
    发明申请

    公开(公告)号:US20190385619A1

    公开(公告)日:2019-12-19

    申请号:US16557390

    申请日:2019-08-30

    Applicant: Google LLC

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

Patent Agency Ranking