System for Information Extraction from Form-Like Documents

    公开(公告)号:US20210374395A1

    公开(公告)日:2021-12-02

    申请号:US16890287

    申请日:2020-06-02

    Applicant: Google LLC

    Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

    ITERATIVE NEURAL CODE TRANSLATION
    3.
    发明公开

    公开(公告)号:US20240184555A1

    公开(公告)日:2024-06-06

    申请号:US18076189

    申请日:2022-12-06

    Applicant: Google LLC

    CPC classification number: G06F8/51 G06F8/42 G06F11/3616 G06N3/0455 G06N3/08

    Abstract: Techniques are described herein for iterative code generation using neural language models. In various implementations, an original source code snippet in a first programming language may be processed using a translation machine learning model to generate a first translation of the original source code snippet in a second programming language. The first translation of the original source code snippet may be evaluated to identify error(s) in the first translation. Based on the error(s), respective mask(s) may be inserted to generate a masked first translation of the original source code snippet in the second programming language. The masked first translation of the original source code snippet may be processed using the translation machine learning model to generate a second translation of the original source code snippet in the second language. The second translation may include infill(s) of corrected source code in place of one or more of the masks.

    Iterative neural code translation

    公开(公告)号:US12093672B2

    公开(公告)日:2024-09-17

    申请号:US18076189

    申请日:2022-12-06

    Applicant: Google LLC

    CPC classification number: G06F8/51 G06F8/42 G06F11/3616 G06N3/0455 G06N3/08

    Abstract: Techniques are described herein for iterative code generation using neural language models. In various implementations, an original source code snippet in a first programming language may be processed using a translation machine learning model to generate a first translation of the original source code snippet in a second programming language. The first translation of the original source code snippet may be evaluated to identify error(s) in the first translation. Based on the error(s), respective mask(s) may be inserted to generate a masked first translation of the original source code snippet in the second programming language. The masked first translation of the original source code snippet may be processed using the translation machine learning model to generate a second translation of the original source code snippet in the second language. The second translation may include infill(s) of corrected source code in place of one or more of the masks.

    SYNTACTICALLY COHERENT CODE SEGMENTATION
    5.
    发明公开

    公开(公告)号:US20240256235A1

    公开(公告)日:2024-08-01

    申请号:US18102039

    申请日:2023-01-26

    Applicant: GOOGLE LLC

    CPC classification number: G06F8/433 G06F8/425 G06F8/427

    Abstract: Techniques are described herein for segmenting source code into syntactically coherent sequences of tokens that satisfy constraints inherent in sequence-to-sequence networks. In various implementations, source code may be processed to generate one or more graphs representing the source code. One or more of the graphs may then be traversed to identify one or more sequences of tokens within the source code that satisfy an input constraint of a sequence-to-sequence network. The source code may be segmented into the identified one or more sequences of tokens. The one or more sequences of tokens may then be processed using the sequence-to-sequence network.

    Syntactically coherent code segmentation

    公开(公告)号:US12265805B2

    公开(公告)日:2025-04-01

    申请号:US18102039

    申请日:2023-01-26

    Applicant: GOOGLE LLC

    Abstract: Techniques are described herein for segmenting source code into syntactically coherent sequences of tokens that satisfy constraints inherent in sequence-to-sequence networks. In various implementations, source code may be processed to generate one or more graphs representing the source code. One or more of the graphs may then be traversed to identify one or more sequences of tokens within the source code that satisfy an input constraint of a sequence-to-sequence network. The source code may be segmented into the identified one or more sequences of tokens. The one or more sequences of tokens may then be processed using the sequence-to-sequence network.

    ITERATIVE NEURAL CODE TRANSLATION

    公开(公告)号:US20240394025A1

    公开(公告)日:2024-11-28

    申请号:US18792153

    申请日:2024-08-01

    Applicant: GOOGLE LLC

    Abstract: Techniques are described herein for iterative code generation using neural language models. In various implementations, an original source code snippet in a first programming language may be processed using a translation machine learning model to generate a first translation of the original source code snippet in a second programming language. The first translation of the original source code snippet may be evaluated to identify error(s) in the first translation. Based on the error(s), respective mask(s) may be inserted to generate a masked first translation of the original source code snippet in the second programming language. The masked first translation of the original source code snippet may be processed using the translation machine learning model to generate a second translation of the original source code snippet in the second language. The second translation may include infill(s) of corrected source code in place of one or more of the masks.

Patent Agency Ranking