Abstract:
A method in a document analysis system automatically extracts from each received electronic document image and text features, in which the image features are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category, next compares the extracted image and text features with feature sets associated with each document category, and then classifies each document to a document category, the feature set of which best matches the extracted features of the document.
Abstract:
In a web service system with one or more web servers, a system and method for distributing content directly from each web server to a single computer transfers files generated on web servers to a central location for access by a system operator. If files generated by multiple web servers are aggregated on a single computer, processing and analysis can be performed on all of the files. Generally, in one aspect, the invention relates to a system and method for transmitting content from one computer to another in a web service system. The web service system includes web servers that provide web pages in response to web page requests. First and second web server agents provide an interface between the web service system and first and second computers, respectively. The first web server agent runs on the first computer and identifies at least a portion of a file for transmission to the second web server agent running on the second computer in the web service system. At least a portion of the file from the first web server agent is transmitted to the second web server agent and then stored by the second web server agent.
Abstract:
A double hull marine vessel is provided which includes a syntactic foam-macrosphere composition between the inner and outer hulls which dissipates force applied to an outer hull.
Abstract:
A clamp for a cylindrical object is provided which is formed of fibers in the form of woven fabric molded in a polymeric matrix. The clamps have two free ends through which one or two rods extend. The compressive force exerted by the clamp can be adjusted by adjusting the distance between the free end.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of extracting data from a received electronic document page that includes multiple copies of a form is provided. The method comprising: automatically processing a received electronic document page that includes multiple copies of a form to group the multiple copies into corresponding number of records; automatically extracting data from each of the multiple copies of the form and saving the extracted data into the corresponding record; automatically comparing the extracted data in the records to determine which copy of the extracted data to select; if all extracted data instances are identical, assigning a high confidence score to the extracted data; and, if all extracted data instances are not identical, flagging the extracted data for a further processing.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically correcting the extracted data using known constraints amongst semantics of extracted data elements is provided. The method includes: analyzing each electronic document in a job to automatically extract data; automatically analyzing the extracted data to identify incorrectly extracted data elements using rules defining constraints amongst semantics of extracted data elements; and automatically attempting to correct the incorrectly extracted data elements using the rules.
Abstract:
A method of automatically extracting data from an electronic document containing a plurality of layout features through progressive refinement is provided. The method includes: analyzing each document to automatically extract images and text features wherein each document includes at least two features that are related to each other, and wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; if one of the at least two related features is not recognized and at least one feature is recognized, selecting a second search space of candidate features in response thereto and in response to predefined rules about the relationship between the two features; and comparing the unrecognized feature with said selected second search space.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document using a plurality of character recognition engines is provided. The method includes: automatically processing each received electronic document page using each of a plurality of recognition engines to extract data; comparing quality of data extracted from each of the recognition engines to assign a confidence score to the extracted data; and selecting extracted data having highest confidence score as the correct extracted data.
Abstract:
A method of automatically extracting data from an electronic document including tables is provided. The method includes: automatically identifying rows of the table using gaps in horizontal projections of the plurality of image sections, wherein at least some of the identified rows in close proximity are collected to form table formations; and automatically identifying columns of the table using at least some of the plurality of image sections that are vertically aligned, wherein the identified columns are grown in each of the table formations using gaps in vertical projections of the plurality of image sections until an obstruction is reached. The method further includes automatically identifying labels in the plurality of corresponding image sections to associate the identified labels with at least one of the identified columns and the identified rows; and automatically extracting data from cells of the table formed by the identified rows and columns.