Abstract:
Method, apparatus and computer program product for correlating performance events in a data processing system. A first event is received at one of a first device and a second device of the data processing system, and a second event is received at one of the first device and the second device. A type of a connection between the first device and the second device is identified to form an identified type of connection, and a relationship between the first event and the second event is determined based on the identified type of connection between the first device and the second device.
Abstract:
A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.
Abstract:
A method of grouping electronic document pages of a job that belong together is provided. The method includes: automatically analyzing images and text features extracted from each received electronic document page to associate the electronic document page with a corresponding document category; automatically identifying features extracted from the electronic document page that potentially indicate to which document group the electronic document page belongs; comparing the identified features with a set of group identifying features associated with corresponding document group, in which the set of group identifying features includes at least a set of page numbers and account numbers; and, if the identified features are found to include a set of a page number and an account number belonging to the set of group identifying features associated with the corresponding document group, grouping the electronic document page into the corresponding document group.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document at least in part using data external to the electronic document but associated with the job containing the document is provided. The method includes: analyzing each electronic document in a job to automatically extract images and text features; and, if any of the images and text features extracted from the electronic document is not recognized, using data external to said document but associated with said job to identify the unrecognized feature, wherein the external source may be one of at least one other document in the job and a database having known values associated with the job.
Abstract:
Network management data is managed by determining that a first version and a second version of a set of network management data have been created. The set of network management data is associated with a plurality of managed entities in a network. First and second network graphs are created based on the first version and second version of the set of network management data, respectively. The first and second network graphs include a first and second set of entities in the plurality of managed entities, respectively. A similarity metric is assigned between at least one or more entities in the first and second set of entities. At least a first entity in the first set of entities and at least a second entity in the second set of entities are determined to be identical entities based on the similarity metric being one of equal to and above a first given threshold.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of extracting data from a received electronic document page that includes multiple copies of a form is provided. The method comprising: automatically processing a received electronic document page that includes multiple copies of a form to group the multiple copies into corresponding number of records; automatically extracting data from each of the multiple copies of the form and saving the extracted data into the corresponding record; automatically comparing the extracted data in the records to determine which copy of the extracted data to select; if all extracted data instances are identical, assigning a high confidence score to the extracted data; and, if all extracted data instances are not identical, flagging the extracted data for a further processing.
Abstract:
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically correcting the extracted data using known constraints amongst semantics of extracted data elements is provided. The method includes: analyzing each electronic document in a job to automatically extract data; automatically analyzing the extracted data to identify incorrectly extracted data elements using rules defining constraints amongst semantics of extracted data elements; and automatically attempting to correct the incorrectly extracted data elements using the rules.
Abstract:
A method of automatically extracting data from an electronic document containing a plurality of layout features through progressive refinement is provided. The method includes: analyzing each document to automatically extract images and text features wherein each document includes at least two features that are related to each other, and wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; if one of the at least two related features is not recognized and at least one feature is recognized, selecting a second search space of candidate features in response thereto and in response to predefined rules about the relationship between the two features; and comparing the unrecognized feature with said selected second search space.
Abstract:
Method, apparatus and computer program product for correlating performance events in a data processing system. A first event is received at one of a first device and a second device of the data processing system, and a second event is received at one of the first device and the second device. A type of a connection between the first device and the second device is identified to form an identified type of connection, and a relationship between the first event and the second event is determined based on the identified type of connection between the first device and the second device.
Abstract:
A method of controlling a scanner to improve automatic recognition and classification of scanned physical documents for a document analysis system, which receives and processes jobs containing at least one electronic document from a plurality of users to automatically recognize and classify the job documents into document categories, is disclosed. The method comprises, using a scan control system, obtaining the capability of, and existing scanner settings for, the scanner upon receiving a command to initiate scanning of physical documents; saving the existing scanner settings of the scanner; automatically commanding the scanner to use new scanner settings, wherein the new scanner settings are selected in accordance with the capability of the recognition system; commanding the scanner to begin scanning operation with the new scanner settings; and automatically resetting the scanner settings of the scanner back to the saved existing scanner settings upon completing of the scanning operation.