摘要:
A method and system for delineating document and/or subdocument boundaries and identifying document and/or subdocument types, the method comprising: automatically generating at least one identifier for identifying which of a plurality of document and/or subdocument images belongs to which of a plurality of categories. The method and/or system optionally may include automatically categorizing a plurality of document and/or subdocument images into a plurality of predetermined categories in accordance with classification rules for said categories.
摘要:
A method and system for delineating document boundaries and identifying document types by analyzing digital images of one or more documents, automatically categorizing one or more pages or subdocuments within the one or more documents and automatically generating delineation identifiers, such as computer-generated images of separation pages inserted between digital images belonging to different categories, a description of the categorization sequence of the digital images, or a computer-generated electronic label affixed or associated with said digital images.
摘要:
Techniques and systems to enhance digital acquisition devices are presented. The enhancements are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some One or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.
摘要:
A method according to one embodiment includes extracting an identifier from an electronic first document, and identifying a complementary document associated with the first document using the identifier. A validity of the first document is determined by simultaneously considering: textual information from the first document; textual information from the complementary document; and predefined business rules. An indication of the determined validity is output. Systems and computer program products for providing, performing, and/or enabling the methodology presented above are also presented.
摘要:
An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements offered by the method and system are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of the analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some embodiments, one or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.
摘要:
An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements offered by the method and system are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of the analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some embodiments, one or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.
摘要:
An improved method of classifying examples into multiple categories using a binary support vector machine (SVM) algorithm. In one preferred embodiment, the method includes the following steps: storing a plurality of user-defined categories in a memory of a computer, analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of the examples; transforming each of the at least one feature vectors so as reflect information about all of the training examples; and building a SVM classifier for each one of the plurality of categories, wherein the process of building a SVM classifier further includes: assigning each of the examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if anyone of the examples belongs to another category as well as the first category, such examples are assigned to the first class only, optimizing at least one tunable parameter of a SVM classifier for the first category, wherein the SVM classifier is trained using the first and second classes; and optimizing a function that converts the output of the binary SVM classifier into a probability of category membership.
摘要:
A system, method, data processing apparatus, and article of manufacture are provided for classifying data. Labeled data points are received, each of the labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; training a transductive classifier using MED through iterative calculation using the at least one cost factor and the labeled data points and the unlabeled data points as training examples; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof.
摘要:
A system, method, data processing apparatus, and article of manufacture are provided for classifying data. Labeled data points are received, each of the labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; training a transductive classifier using MED through iterative calculation using the at least one cost factor and the labeled data points and the unlabeled data points as training examples; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof.
摘要:
An improved method of classifying examples into multiple categories using a binary support vector machine (SVM) algorithm. In one preferred embodiment, the method includes the following steps: storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of the examples; transforming each of the at least one feature vectors so as reflect information about all of the training examples; and building a SVM classifier for each one of the plurality of categories, wherein the process of building a SVM classifier further includes: assigning each of the examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of the examples belongs to another category as well as the first category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for the first category, wherein the SVM classifier is trained using the first and second classes; and optimizing a function that converts the output of the binary SVM classifier into a probability of category membership.