-
1.
公开(公告)号:US20190325211A1
公开(公告)日:2019-10-24
申请号:US15956542
申请日:2018-04-18
Applicant: GOOGLE LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi , Aiyou Chen
Abstract: Systems and methods for assigning word fragments to lines of text in optical character recognition (OCR) extracted data can include at least one processor obtaining a plurality of word fragments from OCR generated data associated with an image. The at least one processor can determine vertical coordinates of each of the word fragments in the image. The at least one processor can cluster the plurality of word fragments into one or more clusters of word fragments based on the vertical coordinates of the plurality of word fragments. The at least one processor can assign each word fragment of a respective cluster to a corresponding text line based on the clustering.
-
2.
公开(公告)号:US20190325247A1
公开(公告)日:2019-10-24
申请号:US15956547
申请日:2018-04-18
Applicant: Google, LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi
IPC: G06K9/34
Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.
-
3.
公开(公告)号:US10679087B2
公开(公告)日:2020-06-09
申请号:US15956547
申请日:2018-04-18
Applicant: Google, LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi
IPC: G06K9/34
Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.
-
公开(公告)号:US10740602B2
公开(公告)日:2020-08-11
申请号:US15956542
申请日:2018-04-18
Applicant: GOOGLE LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi , Aiyou Chen
IPC: G06K9/00 , G06T7/70 , G06F16/28 , G06F16/583
Abstract: Systems and methods for assigning word fragments to lines of text in optical character recognition (OCR) extracted data can include at least one processor obtaining a plurality of word fragments from OCR generated data associated with an image. The at least one processor can determine vertical coordinates of each of the word fragments in the image. The at least one processor can cluster the plurality of word fragments into one or more clusters of word fragments based on the vertical coordinates of the plurality of word fragments. The at least one processor can assign each word fragment of a respective cluster to a corresponding text line based on the clustering.
-
公开(公告)号:US10218599B2
公开(公告)日:2019-02-26
申请号:US15263961
申请日:2016-09-13
Applicant: Google LLC
Inventor: Phillip Oertel , Swaminathan Krishnamurthy , Zaid Ateeq Mian , Christopher J. Park , Mattias Bo Erland Granlund , Amin Ahmad
IPC: G06F17/30 , G06F15/173 , H04L12/751 , G06F17/21 , H04L1/00 , H04L29/08
Abstract: A system for pagination of data based on recorded URL requests, includes a data store comprising a computer readable medium storing a program of instructions for performing the pagination of data based on recorded URL requests; a processor that executes the program of instructions; a data segmentation module to receive a log of the URL requests, and to segment the log for a specific source; a referral tree construction module to construct a referral tree for the specific source based on the segmented log and HTTP referrer fields associated with the log; a tree enhancement module to enhance the referral tree based on site-specific rules; a signal computation module to perform signal computation on a plurality of nodes associated with the enhanced referral tree; a classification module to identify each of the plurality of nodes subsequent to the signal computation is performed on the enhanced referral tree; and a page construction module to construct a web page based on the enhanced referral tree subsequent to the classification module identifying the plurality of nodes.
-
-
-
-