Identification of reading order text segments with a probabilistic language model

Invention Grant

US10372821B2 Identification of reading order text segments with a probabilistic language model 有权

Please log in to see more content

Patent Title: Identification of reading order text segments with a probabilistic language model
Application No.: US15462684

Application Date: 2017-03-17
Publication No.: US10372821B2

Publication Date: 2019-08-06
Inventor: Walter Chang , Trung Bui , Pranjal Daga , Michael Kraley , Hung Bui
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Agency: Kilpatrick Townsend & Stockton LLP
Main IPC: G06F17/22
IPC: G06F17/22 ; G06F17/27 ; G06K9/00

Identification of reading order text segments with a probabilistic language model

Abstract:

Certain embodiments identify a correct structured reading-order sequence of text segments extracted from a file. A probabilistic language model is generated from a large text corpus to comprise observed word sequence patterns for a given language. The language model measures whether splicing together a first text segment with another continuation text segment results in a phrase that is more likely than a phrase resulting from splicing together the first text segment with other continuation text segments. Sets of text segments, which include a first set with a first text segment and a first continuation text segment as well as a second set with the first text segment and a second continuation text segment, are provided to the probabilistic model. A score indicative of a likelihood of the set providing a correct structured reading-order sequence is obtained for each set of text segments.

Public/Granted literature

US20180267956A1 IDENTIFICATION OF READING ORDER TEXT SEGMENTS WITH A PROBABILISTIC LANGUAGE MODEL Public/Granted day:2018-09-20

Information query

Espacenet