Abstract:
A multi-page document is represented as a graph in which extracted page objects of the document, such as text blocks, are represented by nodes that are connected by intra-page edges and/or cross-page edges. The nodes and edges of the graph are associated with respective sets of features, the edge features distinguishing between intra-page and cross-page edges. A trained first model jointly predicts class labels for page objects, based on node and edge features. Page labels for the pages may be predicted, based on the page object predictions, optionally enforcing a constraint, such a maximum of one class label for a given class, per page. The pages can be assigned a respective category, based on the predicted classes of the page objects and respective features. Information based on the predictions is output, such as one or more of the page object class labels, the page labels, and information based thereon.
Abstract:
A system and method predict an optimal machine translation system for a first of a set of users. The method includes, for each of the users, providing a respective user profile which includes rankings for at least some machine translation systems from a set of machine translation systems. The user profile of the first user is updated, based on the user profiles of at least a subset of the other users. The updating includes generating at least one missing ranking. An optimal translation system for the first user from the set of machine translation systems is predicted, based on the updated user profile computed for the first user.
Abstract:
A method and system for document processing allow a service provider to process a document without having access the textual content of the document. The system includes memory which receives an encoded source document from an associated client system. The encoded source document includes structural information and encoded content information. The encoded content information includes a plurality of encoded tokens generated by individually encoding each of a plurality of text tokens of the source document. The structural information includes location information for each of the plurality of text tokens. A processing module processes the encoded document to generate a modified document, without decoding the encoded tokens. A transmission module transmits the modified document to an associated client system whereby the client system is able to generate a transformed document based on the modified document and the plurality of text tokens.
Abstract:
A system and method generate an ontology of linked resources. The method includes providing a policy comprising at least one logical rule which is to hold across an ontology of linked resources and initializing a set of resources with an initial subset of the set of resources, each resource in the initial subset being identified by a respective link. Each of the resources in the subset is processed, which includes populating the ontology with a corresponding member of a resource class, for a resource that is valid against a schema, asserting the member's class as a class specific to the schema of the validated resource in the ontology and providing a dependency specification for extracting links within the resource, each extracted link identifying one of the set of resources. A link property is asserted in the ontology for a link between the resource of the subset containing an extracted link and the resource identified by the extracted link and the ontology populated with a member of the resource class for each newly identified resource. A verification that the at least one logical rule holds across the set of resources in the ontology is performed.
Abstract:
A method of detection of numbered captions in a document includes receiving a document including a sequence of document pages and identifying illustrations on pages of the document. For each identified illustration, associated text is identified. An imitation page is generated for each of the identified illustrations, each imitation page comprising a single illustration and its associated text. For a sequence of the imitation pages, a sequence of terms is identified. Each term is derived from a text fragment of the associate text of a respective imitation page. The terms of a sequence complying with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. The terms of the identified sequence of terms are construed as being at least a part of a numbered caption for a respective illustration in the document.
Abstract:
A computer implemented system and method are disclosed for updating an electronic calendar. The method includes receiving an electronic message in a natural language in which a change in role is expressed and, with a natural language processor implemented by a computer processor, automatically detecting the change in role within the email message, optionally storing the change in role in a contacts database, and proposing updates for entries in an electronic calendar based on the detected change in role.
Abstract:
A system and method predict the translation quality of a translated input document. The method includes receiving an input document pair composed of a plurality of sentence pairs, each sentence pair including a source sentence in a source language and a machine translation of the source language sentence to a target language sentence. For each of the sentence pairs, a representation of the sentence pair is generated, based on a set of features extracted for the sentence pair. Using a generative model, a representation of the input document pair is generated, based on the sentence pair representations. A translation quality of the translated input document is computed, based on the representation of the input document pair.
Abstract:
A system and method for preserving privacy of evidence are provided. In the method, an encrypted first image is generated by encrypting a first image acquired at a first location with a symmetric cryptographic key that is based on first information such as a license plate number extracted from the first image and first metadata associated with the first image, such as a time at which the first image was acquired. When a link is established between a second image and the first image, for example, through visual signature matching, the symmetric cryptographic key can be reconstructed, without having access to the first image, but based instead on the first metadata and information extracted from the second image. The reconstructed symmetric cryptographic key can then be used for decryption of the encrypted image to establish evidence that the license plate number was indeed extracted from the first image.
Abstract:
A method of detection of numbered captions in a document includes receiving a document including a sequence of document pages and identifying illustrations on pages of the document. For each identified illustration, associated text is identified. An imitation page is generated for each of the identified illustrations, each imitation page comprising a single illustration and its associated text. For a sequence of the imitation pages, a sequence of terms is identified. Each term is derived from a text fragment of the associate text of a respective imitation page. The terms of a sequence complying with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. The terms of the identified sequence of terms are construed as being at least a part of a numbered caption for a respective illustration in the document.
Abstract:
A computer implemented system and method are disclosed for updating an electronic calendar. The method includes receiving an electronic message in a natural language in which a change in role is expressed and, with a natural language processor implemented by a computer processor, automatically detecting the change in role within the email message, optionally storing the change in role in a contacts database, and proposing updates for entries in an electronic calendar based on the detected change in role.