摘要:
A method for recognizing text in graphical drawings includes creating a binarized representation of a drawing to form an electronic image of pixels. The image is discriminated between text regions and lines in the image by grouping pixels into blocks and comparing blocks with a predetermined format to identify text regions. The lines that remain in the text regions are removed to create text only regions. The text is recognized in the text only regions.
摘要:
A Product Document Constraint Specification Language (PDCSL) is provided for a document author to represent various types of documentation guidelines that must be enforced within documents or across different documents. A Document Constraint Analyzer (DCA) takes as input a set of document files together with a document constraint specification file, extracts and examines the contents, attributes, and relationships associated with the document objects, and evaluates the logical expressions specified in the document constraints. If a document constraint is not satisfied, an action can be taken to correct the documents or provide an explanation to the document author.
摘要:
A method for extracting Anchorable Information Units (AIUs), from a Portable Document Format (PDF) file, which may either be created using either an editor or by scanning in documents. The method includes parsing the portable document format document into textual portions and non-text portions, and extracting structure from the textual portions and the non-text portions. The method further includes determining text within textual portions, and text the non-text portions, and hyperlinking a plurality of keywords within the textual portions and non-text portions to a related document.
摘要:
Systems and methods are provided for generating and publishing electronic spare parts catalogs that support electronic business processes for managing and selling spare parts for complex machines and systems, such as gas turbines. Automated systems and methods for generating electronic catalogs of spare parts employ an extensible, template-based framework to extract and integrate catalog content (static and/or real-time spare parts data) from various backend business information systems and data sources.
摘要:
A GUI (Graphical User Interface) supported specification method for form field extraction and database mapping in a computer system that includes converting a form file into a fixed electronic document format by using a GUI which is used to specify the form file and conversion parameters and extracting fields from the fixed electronic document format by using the GUI that is used to specify the fields to be extracted; and mapping the fields onto the database schema by using a GUI which is used to specify the mapping between the fields and the database schema.
摘要:
Systems and methods are provided for implementing electronic business applications for managing and selling spare parts, wherein electronic catalogs of spare parts are used to present static and/or real-time spare parts data from disparate backend data sources in a uniform, integrated manner, and wherein business logic programs are provided to support transaction activities using the electronic catalogs of spare parts, such as navigating the catalog content, and retrieving static and/or real-time spare parts data and initiating spare parts sales with the backend business information systems and spare parts data source.
摘要:
A technique for optimizing the archival and management of data stored as XML documents is capable of handling mixed data including highly structured data and unstructured data. The technique maps the structured data to a relational database while storing the unstructured data in its native XML format. The data is updated using a rules database that maps updating rules against attributes and classes of elements within the documents. A document checking/validation engine performs the updates based on rule verification. A search engine searches the documents using both a path index table and a weighted content index.