Invention Grant
- Patent Title: Detecting the bounds of borderless tables in fixed-format structured documents using machine learning
-
Application No.: US15675873Application Date: 2017-08-14
-
Publication No.: US10339212B2Publication Date: 2019-07-02
- Inventor: Ram Bhushan Agrawal , Himanshu Mittal
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: Finch & Maloney PLLC
- Main IPC: G06F17/24
- IPC: G06F17/24 ; G06F17/22 ; G06N3/04

Abstract:
Techniques are disclosed for detecting the bounds of borderless open tables in fixed-format structured documents, such as PDF documents, and grouping text lines into predicted borderless tables. The target document comprises a set of text lines each having a respective vertical and horizontal position in the target document. A sorted list of the text lines is generated based upon a vertical and horizontal position of each text line in the target document. For each text line in the sorted list, a respective probability that the text line in the sorted list belongs to a borderless table is then determined. According to one embodiment, the probability may be determined using a classifier that may employ a logistic regression algorithm.
Public/Granted literature
- US20190050381A1 DETECTING THE BOUNDS OF BORDERLESS TABLES IN FIXED-FORMAT STRUCTURED DOCUMENTS USING MACHINE LEARNING Public/Granted day:2019-02-14
Information query