Abstract:
A method includes receiving or capturing an image comprising an identity document (ID) using a mobile device; classifying the ID; building an extraction model based on the ID classification; extracting data from the ID based on the extraction model; building an ID profile based on the extracted data; storing the ID profile to a memory of the mobile device; detecting a predetermined stimulus in a workflow; identifying workflow-relevant data in the stored ID profile at least partially in response to detecting the predetermined stimulus; providing the workflow-relevant data from the stored ID profile to the workflow; and driving at least a portion of the workflow using the workflow-relevant data. Related systems and computer program products are also disclosed.
Abstract:
In one embodiment, a method includes performing optical character recognition (OCR) on an image of a financial document and at least one of: (a) correct OCR errors in the financial document using at least one of textual information from a complementary document and predefined business rules; (b) normalize data from the complementary document using at least one of textual information from the financial document and the predefined business rules: and (c) normalize data from the financial document using at least one of textual information from the complementary document and the predefined business riles. Exemplary systems and computer program products are also disclosed.
Abstract:
Systems, computer program products, and techniques for detecting and/or reconstructing objects depicted in digital image data within a three-dimensional space are disclosed. The concepts utilize internal features for detection and reconstruction, avoiding reliance on information derived from location of edges. The inventive concepts provide an improvement over conventional techniques since objects may be detected and/or reconstructed even when edges are obscured or not depicted in the digital image data. In one aspect, detecting a document depicted in a digital image includes: detecting a plurality of identifying features of the document, wherein the plurality of identifying features are located internally with respect to the object; projecting a location of one or more edges of the document based at least in part on the plurality of identifying features; and outputting the projected location of the one or more edges of the document to a display of a computer, and/or a memory.
Abstract:
Techniques for binarization and extraction of information from image data are disclosed. The inventive concepts include independently binarizing portions of the image data on the basis of individual features, e.g. per connected component, and using multiple different binarization thresholds to obtain the best possible binarization result for each portion of the image data. Determining the quality of each binarization result may be based on attempted recognition and/or extraction of information therefrom. Independently binarized portions may be assembled into a contiguous result. In one embodiment, a method includes: identifying a region of interest within a digital image; generating a plurality of binarized images based on the region of interest using different binarization thresholds; subjecting the region of interest within a digital image to a plurality of thresholding and extraction iterations; and extracting data from some or all of the plurality of binarized images. Corresponding systems and computer program products are disclosed.
Abstract:
According to one embodiment, a computer-implemented method for cleaning up a data set having a possible incorrect label includes: selecting a plurality of training documents; estimating a quality of an organization of a plurality of categories; and determining whether the quality of the organization is greater than a predetermined quality threshold. Corresponding system and computer program product embodiments are also presented. Other aspects and advantages of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the drawings, illustrate by way of example the principles of the invention.
Abstract:
According to one embodiment, a computer-implemented method is configured for building a classification and/or data extraction knowledge base using an electronic form. The method includes: receiving an electronic form having associated therewith a plurality of metadata labels, each metadata label corresponding to at least one element of interest represented within the electronic form; parsing the plurality of metadata labels to determine characteristic features of the element(s) of interest; building a representation of the electronic form based on the plurality of metadata labels; generating a plurality of permutations of the representation of the electronic form by applying a predetermined set of variations to the representation; and training either a classification model, an extraction model, or both using: the representation of the electronic form, and the plurality of permutations of the representation of the electronic form. Corresponding systems and computer program products are also disclosed.
Abstract:
A method includes: receiving or capturing an image comprising an identity document (ID) using a mobile device; classifying the ID; analyzing the ID based at least in part on the ID classification; determining at least some identifying information from the ID; at least one of building an ID profile and updating the ID profile, based at least in part on the analysis; providing at least one of the ID and the ID classification to a loan application workflow and/or a new financial account workflow; and driving at least a portion of the workflow based at least in part on the ID and the ID classification. Corresponding systems and computer program products are also disclosed.
Abstract:
According to one embodiment, a method includes: capturing an image of a financial document using a camera of a mobile device; performing optical character recognition (OCR) on the image of the financial document; extracting an identifier of the financial document from the image based at least in part on the OCR; associating the image of the financial document with metadata descriptive of one or more of the financial document and financial information relating to the financial document; and storing the image of the financial document and the associated metadata to a memory of the mobile device. Exemplary systems and computer program products are also disclosed.
Abstract:
Systems and methods for mobile image data capture and processing are disclosed. The techniques encompass receipt or capture of digital image data, detecting an object such as a document depicted in a digital image corresponding to the digital image data, processing the digital image to improve image quality, classifying the object from the processed image data, and extracting useful information from the object. Processing may improve image quality by correcting artifacts such as distortion, skew, blur, shadows, etc. common to digital images captured using mobile devices. Classification is based on identifying unique features (and/or combinations thereof) within the image data and determining whether the identified features indicate the object belongs to a class of known objects having similar characteristics, or is unique to all known classes. Extraction is based in whole or in part on object classification. All operations may be performed using mobile technology exclusively.
Abstract:
A method according to one embodiment includes performing optical character recognition (OCR) on an image of a first document; and at least one of: correcting OCR errors in the first document using at least one of textual information from a complementary document and predefined business rules; normalizing data from the complementary document using at least one of textual information from the first document and the predefined business rules; and normalizing data from the first document using at least one of textual information from the complementary document and the predefined business rules. Additional systems, methods and computer program products are also presented.