Invention Grant
- Patent Title: Identification of reading order text segments with a probabilistic language model
-
Application No.: US15462684Application Date: 2017-03-17
-
Publication No.: US10372821B2Publication Date: 2019-08-06
- Inventor: Walter Chang , Trung Bui , Pranjal Daga , Michael Kraley , Hung Bui
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: Kilpatrick Townsend & Stockton LLP
- Main IPC: G06F17/22
- IPC: G06F17/22 ; G06F17/27 ; G06K9/00

Abstract:
Certain embodiments identify a correct structured reading-order sequence of text segments extracted from a file. A probabilistic language model is generated from a large text corpus to comprise observed word sequence patterns for a given language. The language model measures whether splicing together a first text segment with another continuation text segment results in a phrase that is more likely than a phrase resulting from splicing together the first text segment with other continuation text segments. Sets of text segments, which include a first set with a first text segment and a first continuation text segment as well as a second set with the first text segment and a second continuation text segment, are provided to the probabilistic model. A score indicative of a likelihood of the set providing a correct structured reading-order sequence is obtained for each set of text segments.
Public/Granted literature
- US20180267956A1 IDENTIFICATION OF READING ORDER TEXT SEGMENTS WITH A PROBABILISTIC LANGUAGE MODEL Public/Granted day:2018-09-20
Information query