Patent search ap:("salesforce.com Page inc.") AND inv:"Zeyuan Chen"

1.

发明授权
System and method for unsupervised density based table structure identification 有权

公开(公告)号：US11347708B2

公开(公告)日：2022-05-31

申请号：US16680302

申请日：2019-11-11

Applicant: salesforce.com, inc.

Inventor： Ankit Chadha , Zeyuan Chen , Caiming Xiong , Ran Xu , Richard Socher

IPC: G06F16/30 , G06F16/22 , G06F16/28

Abstract: Embodiments described herein provide unsupervised density-based clustering to infer table structure from document. Specifically, a number of words are identified from a block of text in an noneditable document, and the spatial coordinates of each word relative to the rectangular region are identified. Based on the word density of the rectangular region, the words are grouped into clusters using a heuristic radius search method. Words that are grouped into the same cluster are determined to be the element that belong to the same cell. In this way, the cells of the table structure can be identified. Once the cells are identified based on the word density of the block of text, the identified cells can be expanded horizontally or grouped vertically to identify rows or columns of the table structure.

2.

发明公开
SYSTEMS AND METHODS FOR ONLINE ADAPTATION FOR CROSS-DOMAIN STREAMING DATA 审中-公开

公开(公告)号：US20230153307A1

公开(公告)日：2023-05-18

申请号：US17588022

申请日：2022-01-28

Applicant: salesforce.com, inc.

Inventor： Luyu Yang , Mingfei Gao , Zeyuan Chen , Ran Xu , Chetan Ramaiah

IPC: G06F16/2455 , G06F16/242 , G06N20/00

CPC classification number: G06F16/24568 , G06F16/2425 , G06N20/00

Abstract: Embodiments described herein provide an online domain adaptation framework based on cross-domain bootstrapping for online domain adaptation, in which the target domain streaming data is deleted immediately after adapted. At each online query, the data diversity is increased across domains by bootstrapping the source domain to form diverse combinations with the current target query. To fully take advantage of the valuable discrepancies among the diverse combinations, a set of independent learners are trained to preserve the differences. The knowledge of the learners is then integrated by exchanging their predicted pseudo-labels on the current target query to co-supervise the learning on the target domain, but without sharing the weights to maintain the learners' divergence.

3.

发明申请
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA 有权

公开(公告)号：US20220374631A1

公开(公告)日：2022-11-24

申请号：US17484618

申请日：2021-09-24

Applicant: salesforce.com, inc.

Inventor： Mingfei Gao , Zeyuan Chen , Ran Xu

IPC: G06K9/00 , G06N3/08

Abstract: Embodiments described a field extraction system that does not require field-level annotations for training. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

4.

发明授权
Image analysis based document processing for inference of key-value pairs in non-fixed digital documents 有权

公开(公告)号：US11699297B2

公开(公告)日：2023-07-11

申请号：US17140987

申请日：2021-01-04

Applicant: salesforce.com, inc.

Inventor： Mingfei Gao , Zeyuan Chen , Le Xue , Ran Xu , Caiming Xiong

IPC: G06V30/413 , G06F40/186 , G06F40/289 , G06V30/412 , G06F40/295 , G06V30/10 , G06V10/40

CPC classification number: G06V30/413 , G06F40/186 , G06F40/289 , G06V30/412 , G06F40/295 , G06V10/40 , G06V30/10

Abstract: An online system extracts information from non-fixed form documents. The online system receives an image of a form document and obtains a set of phrases and locations of the set of phrases on the form image. For at least one field, the online system determines key scores for the set of phrases. The online system identifies a set of candidate values for the field from the set of identified phrases and identifies a set of neighbors for each candidate value from the set of identified phrases. The online system determines neighbor scores, where a neighbor score for a candidate value and a respective neighbor is determined based on the key score for the neighbor and a spatial relationship of the neighbor to the candidate value. The online system selects a candidate value and a respective neighbor based on the neighbor score as the value and key for the field.

5.

发明申请
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA 有权

公开(公告)号：US20220366317A1

公开(公告)日：2022-11-17

申请号：US17484623

申请日：2021-09-24

Applicant: salesforce.com, inc.

Inventor： Mingfei Gao , Zeyuan Chen , Ran Xu

IPC: G06N20/20 , G06N5/00 , G06N5/04

Abstract: Embodiments described a field extraction system that does not require field-level annotations for training. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

6.

发明申请
IMAGE ANALYSIS BASED DOCUMENT PROCESSING FOR INFERENCE OF KEY-VALUE PAIRS IN NON-FIXED DIGITAL DOCUMENTS 有权

公开(公告)号：US20220215195A1

公开(公告)日：2022-07-07

申请号：US17140987

申请日：2021-01-04

Applicant: salesforce.com, inc.

Inventor： Mingfei Gao , Zeyuan Chen , Le Xue , Ran Xu , Caiming Xiong

IPC: G06K9/00 , G06F40/289 , G06F40/186

Abstract: An online system extracts information from non-fixed form documents. The online system receives an image of a form document and obtains a set of phrases and locations of the set of phrases on the form image. For at least one field, the online system determines key scores for the set of phrases. The online system identifies a set of candidate values for the field from the set of identified phrases and identifies a set of neighbors for each candidate value from the set of identified phrases. The online system determines neighbor scores, where a neighbor score for a candidate value and a respective neighbor is determined based on the key score for the neighbor and a spatial relationship of the neighbor to the candidate value. The online system selects a candidate value and a respective neighbor based on the neighbor score as the value and key for the field.

Patent Agency Ranking