-
公开(公告)号:US20200311571A1
公开(公告)日:2020-10-01
申请号:US16370724
申请日:2019-03-29
Inventor: Darrell Bellert
IPC: G06N5/04
Abstract: A method for processing an electronic document (ED) to infer a sequence of section headings in the ED. The method includes generating, by a computer processor, based on regular expression matching of a predetermined section heading pattern and a plurality of characters in the ED, a list of candidate headings in the ED; generating, by the computer processor and based on the list of candidate headings, a list of chain fragments for inferring a portion of the sequence of section headings; and generating, by the computer processor and based on predetermined criteria, the sequence of section headings by merging at least two chain fragments in the list of chain fragments.
-
公开(公告)号:US11468346B2
公开(公告)日:2022-10-11
申请号:US16370724
申请日:2019-03-29
Inventor: Darrell Bellert
IPC: G06N5/04 , G06F40/258
Abstract: A method for processing an electronic document (ED) to infer a sequence of section headings in the ED. The method includes generating, by a computer processor, based on regular expression matching of a predetermined section heading pattern and a plurality of characters in the ED, a list of candidate headings in the ED; generating, by the computer processor and based on the list of candidate headings, a list of chain fragments for inferring a portion of the sequence of section headings; and generating, by the computer processor and based on predetermined criteria, the sequence of section headings by merging at least two chain fragments in the list of chain fragments.
-
公开(公告)号:US20200320170A1
公开(公告)日:2020-10-08
申请号:US16675456
申请日:2019-11-06
Inventor: Darrell Bellert
Abstract: A method, non-transitory computer readable medium, and system for inferring certain texts as stylized section headings in an electronic document (ED). Stylized section headings are section headings that have unique styling distinct from the body of text below each stylized heading. In particular, the stylized section headings are identified based on styling information in the ED. Identifying stylized section headings includes grouping candidate headings based on identification of dominant styling, locating high level fragments, and repeatedly locating nested fragments from within higher level fragments. The ED may or may not include explicitly identified headings in the document.
-
公开(公告)号:US11494555B2
公开(公告)日:2022-11-08
申请号:US16675456
申请日:2019-11-06
Inventor: Darrell Bellert
IPC: G06F40/205 , G06V30/414 , G06V30/416 , G06F40/258 , G06V30/10 , G06N5/04
Abstract: A method, non-transitory computer readable medium, and system for inferring certain texts as stylized section headings in an electronic document (ED). Stylized section headings are section headings that have unique styling distinct from the body of text below each stylized heading. In particular, the stylized section headings are identified based on styling information in the ED. Identifying stylized section headings includes grouping candidate headings based on identification of dominant styling, locating high level fragments, and repeatedly locating nested fragments from within higher level fragments. The ED may or may not include explicitly identified headings in the document.
-
-
-