摘要:
Various technologies and techniques detect lists in vector graphics based documents and use them in meaningful ways. The system detects at least one list in a vector graphics based document using a set of rules. Pattern detection logic identifies characters, symbols, numbers, letters, and/or images that may start a list. Additional pattern detection logic determines if a list exists. The system can identify and parse bulleted lists, numbered or lettered lists, and nested lists that are any combination of both. Once identified, the content is translated into a modified format. The content can be output to a destination application in the modified format that is more suitable for output or use by the destination application.
摘要:
Various technologies and techniques detect tables in vector graphics based documents and use them in meaningful ways. The system detects at least one table in a vector graphics based document using a set of rules. The rules include analyzing a set of content representing horizontal and vertical lines to find intersections and identifying table cells based on the intersections. Once identified, the table content is translated into a modified format. The content can be output to a destination application in the modified format that is more suitable for output or use by the destination application.
摘要:
Semantic objects are created that provide a structure for markup language representations of documents. The semantic objects include text runs that are produced from the markup language representation and that are placed into semantic blocks that group text runs according to how text is logically structured in the document being represented. The text runs of each semantic block are ordered to correspond to the logical order of the document being represented. The semantic blocks corresponding to each page of the document being represented are ordered to correspond to the logical order of the document being represented. The ordered semantic blocks including the ordered text runs are saved as a semantic object which can they be utilized to make use of the logical structure of the document being represented by the markup language.
摘要:
Semantic objects are created that provide a structure for markup language representations of documents. The semantic objects include text runs that are produced from the markup language representation and that are placed into semantic blocks that group text runs according to how text is logically structured in the document being represented. The text runs of each semantic block are ordered to correspond to the logical order of the document being represented. The semantic blocks corresponding to each page of the document being represented are ordered to correspond to the logical order of the document being represented. The ordered semantic blocks including the ordered text runs are saved as a semantic object which can they be utilized to make use of the logical structure of the document being represented by the markup language.
摘要:
Palette-based, multi-tint, named-color methods and systems utilize a pixel-by-pixel indexing technique in which individual index values into a palette of interest can be used in different ways for rendering associated images across different devices. For some devices, the index values are used to index into the palette of interest to ascertain a specific indexed color value that is then used to render that pixel of the associated image. For other devices, the index value is used as a means to compute a color value that these other devices then use to render that pixel of the associated image.
摘要:
A system and method for associating optical character recognition text data with source images are provided. In one embodiment, an association module of a computing system is configured to receive text data from an OCR engine; associate the text data with a source image; and output associated optical character recognition data including the source image, the text data associated with the source image, and a plurality of referrers. Each referrer of the plurality of referrers may indicate a different image reference. The plurality of referrers are configured to cause the viewer application to output the text data associated with the source image to each instance of the source image that is rendered as part of the fixed-layout document in accordance with the multiple image references.
摘要:
A system and method for associating optical character recognition text data with source images are provided. In one embodiment, an association module of a computing system is configured to receive text data from an OCR engine; associate the text data with a source image; and output associated optical character recognition data including the source image, the text data associated with the source image, and a plurality of referrers. Each referrer of the plurality of referrers may indicate a different image reference. The plurality of referrers are configured to cause the viewer application to output the text data associated with the source image to each instance of the source image that is rendered as part of the fixed-layout document in accordance with the multiple image references.