Hybrid Multilingual Text-Dependent and Text-Independent Speaker Verification

    公开(公告)号:US20220310098A1

    公开(公告)日:2022-09-29

    申请号:US17211791

    申请日:2021-03-24

    Applicant: Google LLC

    Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing a first portion of the audio data that characterizes a predetermined hotword to generate a text-dependent evaluation vector, and generating one or more text-dependent confidence scores. When one of the text-dependent confidence scores satisfies a threshold, the operations include identifying a speaker of the utterance as a respective enrolled user associated with the text-dependent confidence score that satisfies the threshold and initiating performance of an action without performing speaker verification. When none of the text-dependent confidence scores satisfy the threshold, the operations include processing a second portion of the audio data that characterizes a query to generate a text-independent evaluation vector, generating one or more text-independent confidence scores, and determining whether the identity of the speaker of the utterance includes any of the enrolled users.

    Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance

    公开(公告)号:US11410641B2

    公开(公告)日:2022-08-09

    申请号:US16959037

    申请日:2019-11-27

    Applicant: Google LLC

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

    SPEAKER VERIFICATION
    83.
    发明申请

    公开(公告)号:US20210256981A1

    公开(公告)日:2021-08-19

    申请号:US17307704

    申请日:2021-05-04

    Applicant: Google LLC

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

    Speaker verification across locations, languages, and/or dialects

    公开(公告)号:US11017784B2

    公开(公告)日:2021-05-25

    申请号:US16557390

    申请日:2019-08-30

    Applicant: Google LLC

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

    CONTEXTUAL HOTWORDS
    85.
    发明申请

    公开(公告)号:US20210043210A1

    公开(公告)日:2021-02-11

    申请号:US17068681

    申请日:2020-10-12

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.

    MULTI-USER AUTHENTICATION ON A DEVICE
    87.
    发明申请

    公开(公告)号:US20200082812A1

    公开(公告)日:2020-03-12

    申请号:US16682716

    申请日:2019-11-13

    Applicant: Google LLC

    Abstract: In some implementations, an utterance is determined to include a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword. In response to determining that an utterance includes a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword, at least a portion of the utterance is stored as a new sample. A second set of samples of the particular user speaking the utterance is obtained, where the second set of samples includes the new sample and less than all the samples in the first set of samples. A second utterance is determined to include the particular user speaking the hotword based at least on the second set of samples of the user speaking the hotword.

    SPEAKER VERIFICATION
    88.
    发明申请

    公开(公告)号:US20190385619A1

    公开(公告)日:2019-12-19

    申请号:US16557390

    申请日:2019-08-30

    Applicant: Google LLC

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

    SUB-MATRIX INPUT FOR NEURAL NETWORK LAYERS
    89.
    发明申请

    公开(公告)号:US20190318727A1

    公开(公告)日:2019-10-17

    申请号:US16452959

    申请日:2019-06-26

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes generating, by a speech recognition system, a matrix from a predetermined quantity of vectors that each represent input for a layer of a neural network, generating a plurality of sub-matrices from the matrix, using, for each of the sub-matrices, the respective sub-matrix as input to a node in the layer of the neural network to determine whether an utterance encoded in an audio signal comprises a keyword for which the neural network is trained.

    Neural networks for speaker verification

    公开(公告)号:US10325602B2

    公开(公告)日:2019-06-18

    申请号:US15666806

    申请日:2017-08-02

    Applicant: Google LLC

    Abstract: Systems, methods, devices, and other techniques for training and using a speaker verification neural network. A computing device may receive data that characterizes a first utterance. The computing device provides the data that characterizes the utterance to a speaker verification neural network. Subsequently, the computing device obtains, from the speaker verification neural network, a speaker representation that indicates speaking characteristics of a speaker of the first utterance. The computing device determines whether the first utterance is classified as an utterance of a registered user of the computing device. In response to determining that the first utterance is classified as an utterance of the registered user of the computing device, the device may perform an action for the registered user of the computing device.

Patent Agency Ranking