Systems and methods for merging word fragments in optical character recognition-extracted data

    公开(公告)号:US10679087B2

    公开(公告)日:2020-06-09

    申请号:US15956547

    申请日:2018-04-18

    Applicant: Google, LLC

    Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.

    SYSTEMS AND METHODS FOR MERGING WORD FRAGMENTS IN OPTICAL CHARACTER RECOGNITION-EXTRACTED DATA

    公开(公告)号:US20190325247A1

    公开(公告)日:2019-10-24

    申请号:US15956547

    申请日:2018-04-18

    Applicant: Google, LLC

    Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.

Patent Agency Ranking