-
公开(公告)号:US11244203B2
公开(公告)日:2022-02-08
申请号:US16784726
申请日:2020-02-07
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes , Jianbin Tang
IPC: G06K9/62 , G06F16/332 , G06F16/35 , G06F40/205 , G06K9/32 , G06F16/93
Abstract: Methods, systems and computer program products for automatically generating structured training data based on an unstructured document are provided. Aspects include receiving an unstructured document and a corresponding structured document that includes labeled portions. Aspects also include generating a parsed document that has one or more extracted objects by applying a parsing tool to the unstructured document. Aspects also include identifying one or more matching extracted objects by applying a matching algorithm to the structured document and the parsed document. Each matching extracted object is an extracted object of the parsed document that corresponds to a labeled portion of the structured document. Aspects also include annotating a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document.
-
公开(公告)号:US20210049982A1
公开(公告)日:2021-02-18
申请号:US16541214
申请日:2019-08-15
Applicant: International Business Machines Corporation
Inventor: Elaheh ShafieiBavani , Peter Zhong , Rahil Garnavi , Michael Raghib
IPC: G09G5/37
Abstract: A user profile associated with a first user is received. A user prescription associated with the first user is received. A historical interaction of the first user with a display is received. A global vision model is received. One or more display setts to be used on the display is determined based on at least the user profile, the user prescription, the global vision model, and the historical interaction.
-
公开(公告)号:US11734445B2
公开(公告)日:2023-08-22
申请号:US17109454
申请日:2020-12-02
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes , Lenin Mehedy
IPC: G06F21/00 , G06F21/62 , G06V30/414
CPC classification number: G06F21/6227 , G06V30/414
Abstract: In an approach for providing a document access control based on document component layouts, a processor detects a layout of a document, the layout including one or more components of the document. A processor defines an access policy to access the one or more components based on the layout. A processor authorizes a request to access the one or more components based on the access policy and the layout. A processor retrieves the one or more components based on the access policy and the authorized request.
-
公开(公告)号:US11380116B2
公开(公告)日:2022-07-05
申请号:US16659977
申请日:2019-10-22
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes , Elaheh Shafieibavani
IPC: G06V30/414 , G06N3/04 , G06N20/00
Abstract: A computer-implemented method for using a machine learning model to automatically extract tabular data from an image includes receiving a set of images of tabular data and a set of markup data corresponding respectively to the images of tabular data. The method further includes training a first neural network to delineate the tabular data into cells using the markup data, and training a second neural network to determine content of the cells in the tabular data using the markup data. The method further includes, upon receiving an input image containing a first tabular data without any markup data, generating an electronic output corresponding to the first tabular data by determining the structure of the first tabular data using the first neural network and extracting content of the first tabular data using the second neural network.
-
公开(公告)号:US11222615B2
公开(公告)日:2022-01-11
申请号:US16541214
申请日:2019-08-15
Applicant: International Business Machines Corporation
Inventor: Elaheh ShafieiBavani , Peter Zhong , Rahil Garnavi , Michael Raghib
IPC: G09G5/37
Abstract: A user profile associated with a first user is received. A user prescription associated with the first user is received. A historical interaction of the first user with a display is received. A global vision model is received. One or more display sets to be used on the display is determined based on at least the user profile, the user prescription, the global vision model, and the historical interaction.
-
公开(公告)号:US11599711B2
公开(公告)日:2023-03-07
申请号:US17111392
申请日:2020-12-03
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes
IPC: G06F40/157 , G06N3/04 , G06N3/08 , G06V30/413 , G06V30/414
Abstract: Aspects of the present invention disclose a method for automatic delineation and extraction of tabular data in portable document format (PDF). The method includes one or more processors extracting metadata corresponding to tabular data in a text-based portable document format (PDF), wherein the metadata is associated with characters and border lines of the tabular data. The method further includes generating a graph structure corresponding to the tabular data in the text-based PDF based at least in part on the metadata. The method further includes generating a vector representation of the graph structure. The method further includes constructing a tree structure corresponding to the tabular data based at least in part on the vector representation.
-
公开(公告)号:US20220171871A1
公开(公告)日:2022-06-02
申请号:US17109454
申请日:2020-12-02
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antinio Jose Jimeno Yepes , Lenin Mehedy
Abstract: In an approach for providing a document access control based on document component layouts, a processor detects a layout of a document, the layout including one or more components of the document. A processor defines an access policy to access the one or more components based on the layout. A processor authorizes a request to access the one or more components based on the access policy and the layout. A processor retrieves the one or more components based on the access policy and the authorized request.
-
公开(公告)号:US20210286989A1
公开(公告)日:2021-09-16
申请号:US16815391
申请日:2020-03-11
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes , Elaheh ShafieiBavani
IPC: G06K9/00 , G06F16/93 , G06N20/00 , G06N5/04 , G06F40/205
Abstract: Embodiments of the invention describe a computer-implemented method of analyzing an electronic version of a document. The computer-implemented method can include an architecture of machine learning sub-models that performs the global task of translating unstructured and semi-structured inputs into numerical representations that can be recognized and manipulated by a content-analysis (CA) sub-model without relying on brute force analysis. Embodiments of the invention achieve these results by separating the global task into auxiliary tasks and assigning each sub-model to at least one of the auxiliary tasks. The auxiliary tasks can include parsing the unstructured or semi-structured inputs into format types (e.g., lists, tables, figures, text, etc. of a PDF document), extracting features of the parsed document, and performing a computer-based CA on the extracted features. The sub-models are trained in stages and in groups, wherein both the stages and the groupings are based on the complexity of the sub-model's assigned task.
-
公开(公告)号:US20220180044A1
公开(公告)日:2022-06-09
申请号:US17111392
申请日:2020-12-03
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes
IPC: G06F40/157 , G06K9/00 , G06N3/08 , G06N3/04
Abstract: Aspects of the present invention disclose a method for automatic delineation and extraction of tabular data in portable document format (PDF). The method includes one or more processors extracting metadata corresponding to tabular data in a text-based portable document format (PDF), wherein the metadata is associated with characters and border lines of the tabular data. The method further includes generating a graph structure corresponding to the tabular data in the text-based PDF based at least in part on the metadata. The method further includes generating a vector representation of the graph structure. The method further includes constructing a tree structure corresponding to the tabular data based at least in part on the vector representation.
-
公开(公告)号:US20210248420A1
公开(公告)日:2021-08-12
申请号:US16784726
申请日:2020-02-07
Applicant: International Business Machines Corporation
Inventor: Peter Zhong , Antonio Jose Jimeno Yepes , Jianbin Tang
IPC: G06K9/62 , G06F16/332 , G06F16/35 , G06F40/205 , G06F16/93 , G06K9/32
Abstract: Methods, systems and computer program products for automatically generating structured training data based on an unstructured document are provided. Aspects include receiving an unstructured document and a corresponding structured document that includes labeled portions. Aspects also include generating a parsed document that has one or more extracted objects by applying a parsing tool to the unstructured document. Aspects also include identifying one or more matching extracted objects by applying a matching algorithm to the structured document and the parsed document. Each matching extracted object is an extracted object of the parsed document that corresponds to a labeled portion of the structured document. Aspects also include annotating a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document.
-
-
-
-
-
-
-
-
-