Inference Methods For Word Or Wordpiece Tokenization

    公开(公告)号:US20240054288A1

    公开(公告)日:2024-02-15

    申请号:US18205609

    申请日:2023-06-05

    Applicant: Google LLC

    CPC classification number: G06F40/284 G06F16/322 G06F40/40

    Abstract: Systems and methods for performing inference for word or wordpiece tokenization are disclosed using a left-to-right longest-match-first greedy process. In some examples, the vocabulary may be organized into a trie structure in which each node includes a precomputed token or token_ID and a fail link, so that the tokenizer can parse the trie in a single pass to generate a list of only those tokens or token_IDs that correspond to the longest matching vocabulary entries in the sample string, without the need for backtracking. In some examples, the vocabulary may be organized into a trie in which each node has a fail link, and any node that would share token(s) or token_ID(s) of a preceding node is instead given a prev_match link that points back to a chain of nodes with those token(s) or token_ID(s).

    Extreme language model compression with optimal sub-words and shared projections

    公开(公告)号:US11797862B2

    公开(公告)日:2023-10-24

    申请号:US16749570

    申请日:2020-01-22

    Applicant: Google LLC

    CPC classification number: G06N3/088 G06F40/284 G06N3/045

    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

    Fine-grained image similarity
    3.
    发明授权

    公开(公告)号:US10949708B2

    公开(公告)日:2021-03-16

    申请号:US16420154

    申请日:2019-05-22

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, for determining fine-grained image similarity. In one aspect, a method includes training an image embedding function on image triplets by selecting image triplets of first, second and third images; generating, by the image embedding function, a first, second and third representations of the features of the first, second and third images; determining, based on the first representation of features and the second representation of features, a first similarity measure for the first image to the second image; determining, based on the first representation of features and the third representation of features, a second similarity measure for the the first image to the third image; determining, based on the first and second similarity measures, a performance measure of the image embedding function for the image triplet; and adjusting the parameter weights of the image embedding function based on the performance measures for the image triplets.

    Fine-Grained Image Similarity
    4.
    发明申请

    公开(公告)号:US20190279030A1

    公开(公告)日:2019-09-12

    申请号:US16420154

    申请日:2019-05-22

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, for determining fine-grained image similarity. In one aspect, a method includes training an image embedding function on image triplets by selecting image triplets of first, second and third images; generating, by the image embedding function, a first, second and third representations of the features of the first, second and third images; determining, based on the first representation of features and the second representation of features, a first similarity measure for the first image to the second image; determining, based on the first representation of features and the third representation of features, a second similarity measure for the the first image to the third image; determining, based on the first and second similarity measures, a performance measure of the image embedding function for the image triplet; and adjusting the parameter weights of the image embedding function based on the performance measures for the image triplets.

    LEARNING UNIFIED EMBEDDING
    5.
    发明申请

    公开(公告)号:US20200090039A1

    公开(公告)日:2020-03-19

    申请号:US16494842

    申请日:2017-11-17

    Applicant: Google LLC

    Abstract: A computer-implemented method for generating a unified machine learning model using a neural network on a data processing apparatus is described. The method includes the data processing apparatus determining respective learning targets for each of a plurality of object verticals. The data processing apparatus determines the respective learning targets based on two or more embedding outputs of the neural network. The method also includes the data processing apparatus training the neural network to identify data associated with each of the plurality of object verticals. The data processing apparatus trains the neural network using the respective learning targets and based on a first loss function. The data processing apparatus uses the neural network trained to generate a unified machine learning model, where the model is configured to identify particular data items associated with each of the plurality of object verticals.

    Extreme Language Model Compression with Optimal Sub-Words and Shared Projections

    公开(公告)号:US20240013059A1

    公开(公告)日:2024-01-11

    申请号:US18471866

    申请日:2023-09-21

    Applicant: Google LLC

    CPC classification number: G06N3/0455 G06F40/40 G06N3/08 G06F40/284

    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

    Extreme language model compression with optimal sub-words and shared projections

    公开(公告)号:US12260340B2

    公开(公告)日:2025-03-25

    申请号:US18471866

    申请日:2023-09-21

    Applicant: Google LLC

    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

    Inference methods for word or wordpiece tokenization

    公开(公告)号:US11763083B2

    公开(公告)日:2023-09-19

    申请号:US17798638

    申请日:2020-05-18

    Applicant: Google LLC

    CPC classification number: G06F40/284 G06F16/322 G06F40/40

    Abstract: Systems and methods for performing inference for word or wordpiece tokenization are disclosed using a left-to-right longest-match-first greedy process. In some examples, the vocabulary may be organized into a trie structure in which each node includes a precomputed token or token ID and a fail link, so that the tokenizer can parse the trie in a single pass to generate a list of only those tokens or token IDs that correspond to the longest matching vocabulary entries in the sample string, without the need for backtracking. In some examples, the vocabulary may be organized into a trie in which each node has a fail link, and any node that would share token(s) or token_ID(s) of a preceding node is instead given a prev_match link that points back to a chain of nodes with those token(s) or token_ID(s).

Patent Agency Ranking