-
公开(公告)号:US12079574B1
公开(公告)日:2024-09-03
申请号:US17541833
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Brendan Cruz Colon , Jason L. Thalken , Aaron Boswell , Matthew Michael Sommer , Kellen K. Axten
IPC: G06V30/40 , G06F18/211 , G06F18/214 , G06F40/279 , G06N7/01
CPC classification number: G06F40/279 , G06F18/211 , G06F18/214 , G06N7/01
Abstract: Devices and techniques are generally described for evaluation of text data using large n-grams. In various examples, a first vector may be generated for first text data, wherein each element of the vector comprises a value indicating whether the first text data includes a respective n-gram included in a corpus of text data. First label data indicating that a user associated with the first text data has connected to a first computer-implemented service more than a threshold number of times during a past time period may be determined. A first machine learning model may be trained based at least in part on the first vector and the first label data. The first machine learning model may be used to determine a first probability associated with a first n-gram of the first vector. In some examples, at least a first user associated with the first n-gram may be determined.