摘要:
The invention is a process, system, workflow system for data retrieval processes, software, Web Site, service and SaaS (Software as a Service) created to support a data retrieval process from various document types to custom or preset retrieval data structures. The program supports manual, automatic and semiautomatic data retrieval using its internal features or external add-ons. It links data points in the structure to the corresponding data points in the document, stores documents, structures and links between them and outputs results in various formats. Links between a document and a retrieval data structure are established either automatically or manually by the user. After all required links are set, results can be retrieved from the program as an XML (Extensible Markup Language) structure with required data or as a PDF (Portable Document Format) or HTML (Hypertext Format Language), in MS Office formats and others containing a/the retrieval data structure, the original document or both with links between corresponding data points.The system incorporates a Text Mining engine, which provides automatic information retrieval capabilities. The engine implements Text mining technology that is based on Evolutionary Bayesian Ontology Classification. This technology uses Bayesian Ontology for modeling the problem's domain and applies Evolutionary Search for the most plausible classification decision.The ability to learn from data is a key feature of Bayesian Ontology, and for our embodiment. The complexity and size of semantic and format dependencies between elements in a natural language text is too high for analytical descriptions. Plus, we intend to save the user the trouble of building their own data retrieval models. Instead, we rely on an algorithm that automatically links user's data selections to the closest categories in pre-built ontologies and generates selection specific classifiers. Every individual ontology keeps learning from user corrections during its life cycle. The system is specifically built with the ability to accumulate data models learned from various types of documents. The more documents have been processed by the system, the higher generalization capabilities it possesses for automatic processing of new, unseen documents.
摘要:
A method and apparatus for storing and retrieving images of documents, e.g. checks. The method comprises placing a plurality of documents in a document imaging machine and forming an electronic image of each document, storing each electronic image in an electronic storage device, providing at least one user interface device in communication on a communication link with the electronic storage device, placing a request for at least one document image on the user interface device, transmitting the request by the communication link to the electronic storage device, searching the electronic storage device for the requested electronic image of the document, retrieving the at least one electronic image or providing an indication that the image was not found, storing the electronic image, if found, in an electronic file, for transmission to the user interface device at user option, providing the electronic image to the user interface device at command of a user at the user interface device for storage at the user interface device and displaying the requested electronic image on a display of the user interface device. Preferably, the electronic; images are stored with embedded identifying information in a TIFF file format and the check images can be displayed on a display device which permits the user to view both sides of the checks simultaneously and perform functions such as zooming and rotation of the images.
摘要:
The invention is a process, system, and workflow for extracting and warehousing data from semi-structured documents in any language. This includes, but is not limited to, one or more of methods for: the automatic building of text mining term models; the optimization or evolution of such text mining term models; the implementation of document specific (or company specific) memory; and the tying or linking of the extracted data, or metadata, once placed in a target electronic document, to the machine readable, underlying source document, thus providing verification and provenance. The process preferably incorporates a wizard-based method for producing pattern recognition text mining term models to extract data from text. The invention also includes a system, method and workflow for handling a subsequent document of similar design and structure, specifically the automatic extraction of target elements and addition of the same to a database.
摘要:
A method and apparatus for storing and retrieving images of documents, e.g. checks. The method comprises placing a plurality of documents in a document imaging machine and forming an electronic image of each document, storing each electronic image in an electronic storage device, providing at least one user interface device in communication on a communication link with the electronic storage device, placing a request for at least one document image on the user interface device, transmitting the request by the communication link to the electronic storage device, searching the electronic storage device for the requested electronic image of the document, retrieving the at least one electronic image or providing an indication that the image was not found, storing the electronic image, if found, in an electronic file, for transmission to the user interface device at user option, providing the electronic image to the user interface device at command of a user at the user interface device for storage at the user interface device and displaying the requested electronic image on a display of the user interface device. Preferably, the electronic, images are stored with embedded identifying information in a TIFF® (trademark of Aldus Corp.) file format and the check images can be displayed on a display device which permits the user to view both sides of the checks simultaneously and perform functions such as zooming and rotation of the images.
摘要:
A method and computer program for automatic mapping of Extensible Business Reports Language (XBRL) Data to corresponding locations in an initial business document. The program takes XBRL filing, together with text of the initial report, and starts a data mapping engine based on Evolutionary Optimization. The engine searches for the most plausible locations in the document for every data item. After the data locations have been identified, the program tags them in the document and creates visualization forms so a user could easily see and verify correspondence between 2 formats of the same data: saved in XBRL filing and presented in the document.
摘要:
A method and apparatus for storing and retrieving images of documents, e.g. checks. The method comprises placing a plurality of documents in a document imaging machine and forming an electronic image of each document, storing each electronic image in an electronic storage device, providing at least one user interface device in communication on a communication link with the electronic storage device, placing a request for at least one document image on the user interface device, transmitting the request by the communication link to the electronic storage device, searching the electronic storage device for the requested electronic image of the document, retrieving the at least one electronic image or providing an indication that the image was not found, storing the electronic image, if found, in an electronic file, for transmission to the user interface device at user option, providing the electronic image to the user interface device at command of a user at the user interface device for storage at the user interface device and displaying the requested electronic image on a display of the user interface device. Preferably, the electronic, images are stored with embedded identifying information in a TIFF.RTM. (trademark of Aldus Corp.) file format and the check images can be displayed on a display device which permits the user to view both sides of the checks simultaneously and perform functions such as zooming and rotation of the images.