Abstract:
A method according to one embodiment includes performing optical character recognition (OCR) on an image of a first document; generating a list of hypotheses mapping the first document to a complementary document using: textual information from the first document, textual information from the complementary document, and predefined business rules; at least one of: correcting OCR errors in the first document, and normalizing data from the complementary document, using at least one of the textual information from the complementary document and the predefined business rules; determining a validity of the first document based on the hypotheses; and outputting an indication of the determined validity. Additional systems, methods and computer program products are also presented.
Abstract:
According to one embodiment, a computer-implemented method for confirming/rejecting a most relevant example includes: generating a binary decision model by training a binary classifier using a plurality of training documents; classifying one or more test documents into one of a plurality of categories using the binary decision model, wherein the one or more test documents lack a user-defined category label; selecting a most relevant example of the classified test documents from among the classified test documents; displaying, using a display of the computer, the most relevant example of the classified test documents to a user; receiving, via the computer and from the user, a confirmation or a negation of a classification label of the most relevant example of the classified test documents; and storing the confirmation or the negation of the classification label of the most relevant example of the classified test documents to a memory of the computer.
Abstract:
According to one embodiment, a computer-implemented method for cleaning up a data set having a possible incorrect label includes: selecting a plurality of training documents; estimating a quality of an organization of a plurality of categories; and determining whether the quality of the organization is greater than a predetermined quality threshold. Corresponding system and computer program product embodiments are also presented. Other aspects and advantages of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the drawings, illustrate by way of example the principles of the invention.
Abstract:
A method includes storing raw or normalized video data in a computer accessible storage medium; analyzing portions of the video data with a first analytic engine to: determine whether the raw video data is within a first set of parameters; and generate with the first analytic engine a first set of processor settings; processing the raw or normalized video data with the first set of processor settings; and analyzing portions of the processed data with a second analytic engine to determine whether the processed data is within a second set of parameters; generating with the second analytic engine a second set of processor settings to reprocess the raw or normalized video data, sending the second set of processor settings to the first analytic engine; and reprocessing the raw or normalized video data with the first analytic engine using the second set of processor settings.
Abstract:
A method according to one embodiment includes performing optical character recognition (OCR) on an image of a first document; and at least one of: correcting OCR errors in the first document using at least one of textual information from a complementary document and predefined business rules; normalizing data from the complementary document using at least one of textual information from the first document and the predefined business rules; and normalizing data from the first document using at least one of textual information from the complementary document and the predefined business rules. Additional systems, methods and computer program products are also presented.
Abstract:
A method is provided for organizing data sets. In use, an automatic decision system is created or updated for determining whether data elements fit a predefined organization or not, where the decision system is based on a set of preorganized data elements. A plurality of data elements is organized using the decision system. At least one organized data element is selected for output to a user based on a score or confidence from the decision system for the at least one organized data element. Additionally, at least a portion of the at least one organized data element is output to the user. A response is received from the user comprising at least one of a confirmation, modification, and a negation of the organization of the at least one organized data element. The automatic decision system is recreated or updated based on the user response. Other embodiments are also presented.
Abstract:
An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements offered by the method and system are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of the analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some embodiments, one or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.
Abstract:
An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements offered by the method and system are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of the analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some embodiments, one or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.
Abstract:
An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition, which renders unnecessary the physical reacquisition of the analog data in case the digital data by the acquisition are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. One or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements.