-
公开(公告)号:US10740602B2
公开(公告)日:2020-08-11
申请号:US15956542
申请日:2018-04-18
Applicant: GOOGLE LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi , Aiyou Chen
IPC: G06K9/00 , G06T7/70 , G06F16/28 , G06F16/583
Abstract: Systems and methods for assigning word fragments to lines of text in optical character recognition (OCR) extracted data can include at least one processor obtaining a plurality of word fragments from OCR generated data associated with an image. The at least one processor can determine vertical coordinates of each of the word fragments in the image. The at least one processor can cluster the plurality of word fragments into one or more clusters of word fragments based on the vertical coordinates of the plurality of word fragments. The at least one processor can assign each word fragment of a respective cluster to a corresponding text line based on the clustering.
-
2.
公开(公告)号:US20190325211A1
公开(公告)日:2019-10-24
申请号:US15956542
申请日:2018-04-18
Applicant: GOOGLE LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi , Aiyou Chen
Abstract: Systems and methods for assigning word fragments to lines of text in optical character recognition (OCR) extracted data can include at least one processor obtaining a plurality of word fragments from OCR generated data associated with an image. The at least one processor can determine vertical coordinates of each of the word fragments in the image. The at least one processor can cluster the plurality of word fragments into one or more clusters of word fragments based on the vertical coordinates of the plurality of word fragments. The at least one processor can assign each word fragment of a respective cluster to a corresponding text line based on the clustering.
-
3.
公开(公告)号:US10679087B2
公开(公告)日:2020-06-09
申请号:US15956547
申请日:2018-04-18
Applicant: Google, LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi
IPC: G06K9/34
Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.
-
4.
公开(公告)号:US20190325247A1
公开(公告)日:2019-10-24
申请号:US15956547
申请日:2018-04-18
Applicant: Google, LLC
Inventor: Ivan Ordonez , Swaminathan Krishnamurthy , David Paul , Tushar Udeshi
IPC: G06K9/34
Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.
-
-
-