Hierarchical Dictionary with Statistical Filtering Based on Word Frequency

    公开(公告)号:US20170220679A1

    公开(公告)日:2017-08-03

    申请号:US15395778

    申请日:2016-12-30

    CPC classification number: G06F16/36 G06F16/313 G06F16/335

    Abstract: A hierarchical dictionary having methods of storing words based on frequency thereof in one or more documents which includes the steps of identifying a hash value corresponding to an inputted word; storing the word in a first hash map and in a second hash map having a substantially larger word storage capacity than the first hash map based on the identified hash value; clearing the first hash map at every predetermined period or triggering event; determining whether a frequency of the word as stored in the second hash map exceeds a predetermined value; and if so, promoting the word from the second hash map to a third hash map having a substantially larger word storage capacity than the second hash map for long-term storage and later retrieval.

    Content delineation in document images

    公开(公告)号:US09798924B2

    公开(公告)日:2017-10-24

    申请号:US14827725

    申请日:2015-08-17

    CPC classification number: G06K9/00449 G06K9/4647 G06K9/527 G06K9/6857

    Abstract: Methods and apparatus delineate grouped together content in documents. Void and unvoid pixels in document images get clustered together. Execution of a histogram and autocorrelation function, including peak detection, against the unvoid clusters reveals the content. Techniques for clustering include iteratively transforming an original image into secondary images with a Haar wavelet transformation, for example. Clustering begins on a lowest image plane and advances to a next highest plane until all void and unvoid pixels in the images are grouped. Void clusters at lower levels remain void clusters at higher levels, thus only unvoid clusters of pixels require processing at higher levels thereby optimizing processing. Imaging devices with scanners define suitable hardware for transformation of the document into images and processors with executable code cluster together pixels to delineate content. Further processing includes executing OCR or other routines post void/unvoid analysis.

Patent Agency Ranking