Patent search ap:("Google LLC") AND inv:"Marc Najork" Page 1

1.

发明授权
Systems and methods for machine-learned prediction of semantic similarity between documents 有权

公开(公告)号：US11694034B2

公开(公告)日：2023-07-04

申请号：US17078569

申请日：2020-10-23

Applicant: Google LLC

Inventor： Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li

IPC: G06F40/30 , G06N3/08 , G06F40/205 , G06N3/045

CPC classification number: G06F40/30 , G06F40/205 , G06N3/045 , G06N3/08

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

2.

发明申请
Systems and Methods for Machine-Learned Prediction of Semantic Similarity Between Documents 有权

公开(公告)号：US20220129638A1

公开(公告)日：2022-04-28

申请号：US17078569

申请日：2020-10-23

Applicant: Google LLC

Inventor： Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li

IPC: G06F40/30 , G06F40/205 , G06N3/04 , G06N3/08

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

3.

发明申请
System for Information Extraction from Form-Like Documents 有权

公开(公告)号：US20210374395A1

公开(公告)日：2021-12-02

申请号：US16890287

申请日：2020-06-02

Applicant: Google LLC

Inventor： Sandeep Tata , Bodhisattwa Prasad Majumder , Qi Zhao , James Bradley Wendt , Marc Najork , Navneet Potti

IPC: G06K9/00 , G06K9/20 , G06K9/62 , G06T7/70 , G06K9/72 , G06N20/00 , G06N5/04

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

4.

发明申请
Systems and Methods for Active Learning 审中-公开

公开(公告)号：US20200250527A1

公开(公告)日：2020-08-06

申请号：US16750053

申请日：2020-01-23

Applicant: Google LLC

Inventor： Qi Zhao , Abbas Kazerouni , Sandeep Tata , Jing Xie , Marc Najork

IPC: G06N3/08 , G06N3/04

Abstract: The present disclosure provides computing systems and methods directed to active learning and may provide advantages or improvements to active learning applications for skewed data sets. A challenge in training and developing high-quality models for many supervised learning scenarios is obtaining labeled training examples. This disclosure provides systems and methods for active learning on a training dataset that includes both labeled and unlabeled datapoints. In particular, the systems and methods described herein can select (e.g., at each of a number of iterations) a number of the unlabeled datapoints for which labels should be obtained to gain additional labeled datapoints on which to train a machine-learned model (e.g., machine-learned classifier model). Generally, the disclosure provides cost-effective methods and systems for selecting data to improve machine-learned models in applications such as the identification of content items in text, images, and/or audio.

5.

发明授权
Systems and methods for machine-learned prediction of semantic similarity between documents 有权

公开(公告)号：US12210837B2

公开(公告)日：2025-01-28

申请号：US18321424

申请日：2023-05-22

Applicant: Google LLC

Inventor： Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li

IPC: G06F40/30 , G06F40/205 , G06N3/045 , G06N3/08

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

6.

发明公开
System for Information Extraction from Form-Like Documents 审中-公开

公开(公告)号：US20240046684A1

公开(公告)日：2024-02-08

申请号：US18490652

申请日：2023-10-19

Applicant: Google LLC

Inventor： Sandeep Tata , Bodhisattwa Prasad Majumder , Qi Zhao , James Bradley Wendt , Marc Najork , Navneet Potti

IPC: G06V30/413 , G06T7/70 , G06N20/00 , G06N5/04 , G06V30/412 , G06V30/262 , G06V30/416 , G06F18/21 , G06F18/22

CPC classification number: G06V30/413 , G06T7/70 , G06N20/00 , G06N5/04 , G06V30/412 , G06V30/274 , G06V30/416 , G06F18/21 , G06F18/22 , G06T2207/30176

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

7.

发明公开
Systems and Methods for Machine-Learned Prediction of Semantic Similarity Between Documents 审中-公开

公开(公告)号：US20230297783A1

公开(公告)日：2023-09-21

申请号：US18321424

申请日：2023-05-22

Applicant: Google LLC

Inventor： Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li

IPC: G06F40/30 , G06N3/08 , G06F40/205 , G06N3/045

CPC classification number: G06F40/30 , G06N3/08 , G06F40/205 , G06N3/045

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

8.

发明授权
System for information extraction from form-like documents 有权

公开(公告)号：US11830269B2

公开(公告)日：2023-11-28

申请号：US17867300

申请日：2022-07-18

Applicant: Google LLC

Inventor： Sandeep Tata , Bodhisattwa Prasad Majumder , Qi Zhao , James Bradley Wendt , Marc Najork , Navneet Potti

IPC: G06T7/70 , G06V30/413 , G06N20/00 , G06N5/04 , G06V30/412 , G06V30/262 , G06V30/416 , G06F18/21 , G06F18/22

CPC classification number: G06V30/413 , G06F18/21 , G06F18/22 , G06N5/04 , G06N20/00 , G06T7/70 , G06V30/274 , G06V30/412 , G06V30/416 , G06T2207/30176

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

9.

发明申请
System for Information Extraction from Form-Like Documents 有权

公开(公告)号：US20220375245A1

公开(公告)日：2022-11-24

申请号：US17867300

申请日：2022-07-18

Applicant: Google LLC

Inventor： Sandeep Tata , Bodhisattwa Prasad Majumder , Qi Zhao , James Bradley Wendt , Marc Najork , Navneet Potti

IPC: G06V30/412 , G06K9/62 , G06T7/70 , G06N20/00 , G06N5/04 , G06V10/22 , G06V30/262 , G06V30/413 , G06V30/416

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

10.

发明公开
SYSTEMS AND METHODS FOR USING DOCUMENT ACTIVITY LOGS TO TRAIN MACHINE-LEARNED MODELS FOR DETERMINING DOCUMENT RELEVANCE 审中-公开

公开(公告)号：US20230267277A1

公开(公告)日：2023-08-24

申请号：US18010727

申请日：2020-06-15

Applicant: Google LLC

Inventor： Weize Kong , Michael Bendersky , Marc Najork , Rama Kumar Pasumarthi , Zhen Qin , Rolf Jagerman

IPC: G06F40/30 , G06N20/00 , G06F16/9538 , G06F16/9535

CPC classification number: G06F40/30 , G06N20/00 , G06F16/9538 , G06F16/9535

Abstract: Systems and methods of the present disclosure are directed to a method for training a machine-learned semantic matching model. The method can include obtaining a first and second document and a first and second activity log. The method can include determining, based on the first document activity log and the second document activity log, a relation label indicative of whether the documents are related. The method can include inputting the documents into the model to receive a semantic similarity value representing an estimated semantic similarity between the first document and the second document. The method can include evaluating a loss function that evaluates a difference between the relation label and the semantic similarity value. The method can include modifying values of parameters of the model based on the loss function.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification