摘要:
A method in a document analysis system automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each category of document to determine whether the document is recognizable as belonging to a document category. If an electronic document is recognized as belonging to one of the document categories, the method classifies the electronic document as belonging to that document category. If, however, an electronic document is unrecognized, the method submits the unrecognized document to a learning phase, in which the unrecognized document is presented to a human trainer for manual classification of the unrecognized electronic document into a document category, and automatically modifies at least one of the features and the weights of the feature set of the document category corresponding to the manually-classified electronic document using the automatically extracted features of the manually-classified document.
摘要:
A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.
摘要:
A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.
摘要:
Methods and apparatuses, including computer program products, are described for establishing secure communications sessions between computing devices located behind network security devices. The method includes receiving, from a first client computing device, a request for a secure connection with a second client computing device, the request including a first transport protocol role and a first security protocol role associated with the first device. The method includes transmitting the request to the second device. The method includes receiving, from the second device, a response to the request including a second transport protocol role and a second security protocol role associated with the second device, transmitting the response to the first device, and establishing the secure connection between the first device and the second device, where the first and second security protocol roles are determined independently from the first and second transport protocol roles.
摘要:
The present invention includes a method of secure data entry that enables complex data entry work to be performed by unskilled workers that results in data entry with higher productivity, higher quality and higher security than data entry performed by highly skilled workers. The invention identifies data fields on an electronic image of an identified input page, sequences identified data field images, and individually displays data field images for manual data entry. The invention also provides for extracting data from a data field image and displaying extracted data along with the corresponding data field image for approval or correction. Sequenced data field images are optionally reordered or randomized for display and manual entry.
摘要:
Methods and apparatuses, including computer program products, are described for establishing secure communications sessions between computing devices located behind network security devices. The method includes receiving, from a first client computing device, a request for a secure connection with a second client computing device, the request including a first transport protocol role and a first security protocol role associated with the first device. The method includes transmitting the request to the second device. The method includes receiving, from the second device, a response to the request including a second transport protocol role and a second security protocol role associated with the second device, transmitting the response to the first device, and establishing the secure connection between the first device and the second device, where the first and second security protocol roles are determined independently from the first and second transport protocol roles.
摘要:
A method of training a document analysis system that automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each document category is provided. If an electronic document is recognized as belonging to one of the document categories with predetermined confidence, the method classifies the electronic document as being of that one document category. If an electronic document is not recognized as belonging to one of the document categories with predetermined confidence, however, the method submits the unrecognized document to a training phase in which the document is recognized as belonging to a document category and automatically modifies at least one of the features and the weights of the features of the feature set for the document category for the now-recognized document.
摘要:
A method of parallel processing jobs received from a plurality of users by a document analysis system that automatically classifies documents to organize each job, automatically separates each job into its constituent electronic document and automatically separate the document into subsets of electronic pages. For each page of each subset, the method automatically extracts image features that are indicative of how the document is laid out or textually-organized. For each subset, the method automatically compares the extracted features with feature sets associated with each document category to determine a comparison score for the subset. The method then classifies the electronic document as being one of the categories of documents using the comparison score for each of the subsets and organize the job according to the categories of documents the job contains.
摘要:
A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.