-
公开(公告)号:US20200380023A1
公开(公告)日:2020-12-03
申请号:US16998891
申请日:2020-08-20
Applicant: Google LLC
Inventor: Gregory Sean Corrado , Tomas Mikolov , Samy Bengio , Yoram Singer , Jonathon Shlens , Andrea L. Frome , Jeffrey Adgate Dean , Mohammad Norouzi
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying data objects. One of the methods includes obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term; obtaining classification data for a data object, wherein the classification data includes a respective score for each of a plurality of categories, and wherein each of the categories is associated with a respective category label; computing an aggregate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores; identifying a first term in the vocabulary of terms having a high-dimensional representation that is closest to the aggregate high-dimensional representation; and selecting the first term as a category label for the data object.
-
公开(公告)号:US10922488B1
公开(公告)日:2021-02-16
申请号:US16363460
申请日:2019-03-25
Applicant: Google LLC
Inventor: Tomas Mikolov , Kai Chen , Gregory S. Corrado , Jeffrey A. Dean
IPC: G10L15/00 , G06F40/279 , G10L15/06 , G06N20/00 , G06F40/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.
-
公开(公告)号:US10769191B2
公开(公告)日:2020-09-08
申请号:US14576907
申请日:2014-12-19
Applicant: Google LLC
Inventor: Gregory Sean Corrado , Tomas Mikolov , Samy Bengio , Yoram Singer , Jonathon Shlens , Andrea L. Frome , Jeffrey Adgate Dean , Mohammad Norouzi
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying data objects. One of the methods includes obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term; obtaining classification data for a data object, wherein the classification data includes a respective score for each of a plurality of categories, and wherein each of the categories is associated with a respective category label; computing an aggregate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores; identifying a first term in the vocabulary of terms having a high-dimensional representation that is closest to the aggregate high-dimensional representation; and selecting the first term as a category label for the data object.
-
公开(公告)号:US11960519B2
公开(公告)日:2024-04-16
申请号:US16998891
申请日:2020-08-20
Applicant: Google LLC
Inventor: Gregory Sean Corrado , Tomas Mikolov , Samy Bengio , Yoram Singer , Jonathon Shlens , Andrea L Frome , Jeffrey Adgate Dean , Mohammad Norouzi
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying data objects. One of the methods includes obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term; obtaining classification data for a data object, wherein the classification data includes a respective score for each of a plurality of categories, and wherein each of the categories is associated with a respective category label; computing an aggregate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores; identifying a first term in the vocabulary of terms having a high-dimensional representation that is closest to the aggregate high-dimensional representation; and selecting the first term as a category label for the data object.
-
公开(公告)号:US10241997B1
公开(公告)日:2019-03-26
申请号:US15682374
申请日:2017-08-21
Applicant: Google LLC
Inventor: Tomas Mikolov , Kai Chen , Gregory S. Corrado , Jeffrey A. Dean
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.
-
公开(公告)号:US20240220527A1
公开(公告)日:2024-07-04
申请号:US18606458
申请日:2024-03-15
Applicant: Google LLC
Inventor: Gregory Sean Corrado , Tomas Mikolov , Samuel Bengio , Yoram Singer , Jonathon Shlens , Andrea L. Frome , Jeffrey Adgate Dean , Mohammad Norouzi
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying data objects. One of the methods includes obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term; obtaining classification data for a data object, wherein the classification data includes a respective score for each of a plurality of categories, and wherein each of the categories is associated with a respective category label; computing an aggregate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores; identifying a first term in the vocabulary of terms having a high-dimensional representation that is closest to the aggregate high-dimensional representation; and selecting the first term as a category label for the data object.
-
公开(公告)号:US20240070392A1
公开(公告)日:2024-02-29
申请号:US18503051
申请日:2023-11-06
Applicant: Google LLC
Inventor: Tomas Mikolov , Kai Chen , Gregory S. Corrado , Jeffrey A. Dean
IPC: G06F40/279 , G06F40/30 , G06N20/00 , G10L15/06
CPC classification number: G06F40/279 , G06F40/30 , G06N20/00 , G10L15/06
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.
-
公开(公告)号:US11809824B1
公开(公告)日:2023-11-07
申请号:US17175550
申请日:2021-02-12
Applicant: Google LLC
Inventor: Tomas Mikolov , Kai Chen , Gregory S. Corrado , Jeffrey A. Dean
IPC: G06F40/30 , G06F40/279 , G06N20/00 , G10L15/06
CPC classification number: G06F40/279 , G06F40/30 , G06N20/00 , G10L15/06
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.
-
-
-
-
-
-
-