Abstract:
A method in a document analysis system automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each category of document to determine whether the document is recognizable as belonging to a document category. If an electronic document is recognized as belonging to one of the document categories, the method classifies the electronic document as belonging to that document category. If, however, an electronic document is unrecognized, the method submits the unrecognized document to a learning phase, in which the unrecognized document is presented to a human trainer for manual classification of the unrecognized electronic document into a document category, and automatically modifies at least one of the features and the weights of the feature set of the document category corresponding to the manually-classified electronic document using the automatically extracted features of the manually-classified document.
Abstract:
The present invention includes a method of secure data entry that enables complex data entry work to be performed by unskilled workers that results in data entry with higher productivity, higher quality and higher security than data entry performed by highly skilled workers. The invention identifies data fields on an electronic image of an identified input page, sequences identified data field images, and individually displays data field images for manual data entry. The invention also provides for extracting data from a data field image and displaying extracted data along with the corresponding data field image for approval or correction. Sequenced data field images are optionally reordered or randomized for display and manual entry.
Abstract:
A method of extracting data by narrowing a scope of data search using contour matching of select elements in a document is provided. The method includes: analyzing each document to automatically extract images and text features wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; automatically processing each unrecognized feature using a contour recognition engine to generate a contour of the unrecognized feature; automatically selecting a second search space of candidate features through contour matching using the contour of the unrecognized feature, wherein the second search space of candidate features is narrower than the first search space of candidate features; and comparing the unrecognized feature with said second search space to identify the previously unrecognized feature.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document at least in part using data external to the electronic document but associated with the job containing the document is provided. The method includes: analyzing each electronic document in a job to automatically extract images and text features; and, if any of the images and text features extracted from the electronic document is not recognized, using data external to said document but associated with said job to identify the unrecognized feature, wherein the external source may be one of at least one other document in the job and a database having known values associated with the job.
Abstract:
A method of training a document analysis system that automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each document category is provided. If an electronic document is recognized as belonging to one of the document categories with predetermined confidence, the method classifies the electronic document as being of that one document category. If an electronic document is not recognized as belonging to one of the document categories with predetermined confidence, however, the method submits the unrecognized document to a training phase in which the document is recognized as belonging to a document category and automatically modifies at least one of the features and the weights of the features of the feature set for the document category for the now-recognized document.
Abstract:
A method of parallel processing jobs received from a plurality of users by a document analysis system that automatically classifies documents to organize each job, automatically separates each job into its constituent electronic document and automatically separate the document into subsets of electronic pages. For each page of each subset, the method automatically extracts image features that are indicative of how the document is laid out or textually-organized. For each subset, the method automatically compares the extracted features with feature sets associated with each document category to determine a comparison score for the subset. The method then classifies the electronic document as being one of the categories of documents using the comparison score for each of the subsets and organize the job according to the categories of documents the job contains.
Abstract:
A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.
Abstract:
In a web service system with one or more web servers, a system and method for distributing content directly from each web server to a single computer transfers files generated on web servers to a central location for access by a system operator. If files generated by multiple web servers are aggregated on a single computer, processing and analysis can be performed on all of the files. Generally, in one aspect, the invention relates to a system and method for transmitting content from one computer to another in a web service system. The web service system includes web servers that provide web pages in response to web page requests. First and second web server agents provide an interface between the web service system and first and second computers, respectively. The first web server agent runs on the first computer and identifies at least a portion of a file for transmission to the second web server agent running on the second computer in the web service system. At least a portion of the file from the first web server agent is transmitted to the second web server agent and then stored by the second web server agent.
Abstract:
In a web service system with one or more web servers, a system and method for distributing content directly from each web server to a single computer transfers files generated on web servers to a central location for access by a system operator. If files generated by multiple web servers are aggregated on a single computer, processing and analysis can be performed on all of the files. Generally, in one aspect, the invention relates to a system and method for transmitting content from one computer to another in a web service system. The web service system includes web servers that provide web pages in response to web page requests. First and second web server agents provide an interface between the web service system and first and second computers, respectively. The first web server agent runs on the first computer and identifies at least a portion of a file for transmission to the second web server agent running on the second computer in the web service system. At least a portion of the file from the first web server agent is transmitted to the second web server agent and then stored by the second web server agent.