-
公开(公告)号:US11983903B2
公开(公告)日:2024-05-14
申请号:US18500034
申请日:2023-11-01
Applicant: Google LLC
Inventor: Neil Matthew Tinmouth Houlsby , Sylvain Gelly , Jakob D. Uszkoreit , Xiaohua Zhai , Georg Heigold , Lucas Klaus Beyer , Alexander Kolesnikov , Matthias Johannes Lorenz Minderer , Dirk Weissenborn , Mostafa Dehghani , Alexey Dosovitskiy , Thomas Unterthiner
CPC classification number: G06T7/97 , G06F18/24 , G06N3/045 , G06N3/08 , G06T2207/20081 , G06T2207/20084
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.
-
公开(公告)号:US20240038245A1
公开(公告)日:2024-02-01
申请号:US18485069
申请日:2023-10-11
Applicant: Google LLC
Inventor: Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US11557277B2
公开(公告)日:2023-01-17
申请号:US17644362
申请日:2021-12-15
Applicant: Google LLC
Inventor: Georg Heigold , Erik McDermott , Vincent O. VanHoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US09978374B2
公开(公告)日:2018-05-22
申请号:US14846187
申请日:2015-09-04
Applicant: Google LLC
Inventor: Georg Heigold , Samy Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US11961525B2
公开(公告)日:2024-04-16
申请号:US17444384
申请日:2021-08-03
Applicant: Google LLC
Inventor: Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US20220108478A1
公开(公告)日:2022-04-07
申请号:US17492537
申请日:2021-10-01
Applicant: Google LLC
Inventor: Neil Matthew Tinmouth Houlsby , Sylvain Gelly , Jakob D. Uszkoreit , Xiaohua Zhai , Georg Heigold , Lucas Klaus Beyer , Alexander Kolesnikov , Matthias Johannes Lorenz Minderer , Dirk Weissenborn , Mostafa Dehghani , Alexey Dosovitskiy , Thomas Unterthiner
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.
-
公开(公告)号:US20210125601A1
公开(公告)日:2021-04-29
申请号:US17143140
申请日:2021-01-06
Applicant: Google LLC
Inventor: Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A.U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US20200118549A1
公开(公告)日:2020-04-16
申请号:US16573323
申请日:2019-09-17
Applicant: Google LLC
Inventor: Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A.U. Bacchiani
IPC: G10L15/06 , G06N3/04 , G10L15/183 , G10L15/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US10482873B2
公开(公告)日:2019-11-19
申请号:US15910720
申请日:2018-03-02
Applicant: Google LLC
Inventor: Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US12125247B2
公开(公告)日:2024-10-22
申请号:US17492537
申请日:2021-10-01
Applicant: Google LLC
Inventor: Neil Matthew Tinmouth Houlsby , Sylvain Gelly , Jakob D. Uszkoreit , Xiaohua Zhai , Georg Heigold , Lucas Klaus Beyer , Alexander Kolesnikov , Matthias Johannes Lorenz Minderer , Dirk Weissenborn , Mostafa Dehghani , Alexey Dosovitskiy , Thomas Unterthiner
CPC classification number: G06T7/97 , G06F18/24 , G06N3/045 , G06N3/08 , G06T2207/20081 , G06T2207/20084
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.
-
-
-
-
-
-
-
-
-