SYSTEMS AND METHODS FOR MERGING WORD FRAGMENTS IN OPTICAL CHARACTER RECOGNITION-EXTRACTED DATA

    公开(公告)号:US20190325247A1

    公开(公告)日:2019-10-24

    申请号:US15956547

    申请日:2018-04-18

    Applicant: Google, LLC

    Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.

    Systems and methods for merging word fragments in optical character recognition-extracted data

    公开(公告)号:US10679087B2

    公开(公告)日:2020-06-09

    申请号:US15956547

    申请日:2018-04-18

    Applicant: Google, LLC

    Abstract: Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.

    Identifying referral pages based on recorded URL requests

    公开(公告)号:US10218599B2

    公开(公告)日:2019-02-26

    申请号:US15263961

    申请日:2016-09-13

    Applicant: Google LLC

    Abstract: A system for pagination of data based on recorded URL requests, includes a data store comprising a computer readable medium storing a program of instructions for performing the pagination of data based on recorded URL requests; a processor that executes the program of instructions; a data segmentation module to receive a log of the URL requests, and to segment the log for a specific source; a referral tree construction module to construct a referral tree for the specific source based on the segmented log and HTTP referrer fields associated with the log; a tree enhancement module to enhance the referral tree based on site-specific rules; a signal computation module to perform signal computation on a plurality of nodes associated with the enhanced referral tree; a classification module to identify each of the plurality of nodes subsequent to the signal computation is performed on the enhanced referral tree; and a page construction module to construct a web page based on the enhanced referral tree subsequent to the classification module identifying the plurality of nodes.

Patent Agency Ranking