摘要:
The present invention provides an apparatus for logically processing a composite graph in a formatted document, the apparatus comprising: a composite graph block extraction unit, used to extract a composite graph block in the formatted document; a document parsing unit, used to parse the formatted document to obtain a text element contained therein; a cutline element extraction unit, used to extract a cutline element from the text element; a correlativity detection unit, used to detect correlativity between the composite graph block and the cutline element; a correlativity storage unit, used to store the detected correlativity. The present invention also provides a method for logically processing a composite graph in a formatted document. According to the technical scheme disclosed in the present invention, it is easily achieve layout understanding of the composite graph in a graph-text mixed layout of the formatted document, so as to avoid a logical error.
摘要:
A logic process apparatus for composite graphs in a fixed layout document is provided in this invention. The apparatus includes a composite graph block extraction unit, for extracting composite graph blocks from the fixed layout document, a document parsing unit, for parsing the fixed layout document to obtain text primitives contained therein, a legend primitive extraction unit, for extracting legend primitives from the text primitives, a correlation detection unit, for detecting correlations between the composite graph blocks and the legend primitives, and a correlation storage unit, for storing the detected correlations. A logic process method for composite graphs in a fixed layout document is also provided.
摘要:
The present invention provides an apparatus for logically processing a composite graph in a formatted document, the apparatus comprising: a composite graph block extraction unit, used to extract a composite graph block in the formatted document; a document parsing unit, used to parse the formatted document to obtain a text element contained therein; a cutline element extraction unit, used to extract a cutline element from the text element; a correlativity detection unit, used to detect correlativity between the composite graph block and the cutline element; a correlativity storage unit, used to store the detected correlativity. The present invention also provides a method for logically processing a composite graph in a formatted document. According to the technical scheme disclosed in the present invention, it is easily achieve layout understanding of the composite graph in a graph-text mixed layout of the formatted document, so as to avoid a logical error.
摘要:
The present invention provides an apparatus for logically processing a composite graph in a formatted document, the apparatus comprising: a composite graph block extraction unit, used to extract a composite graph block in the formatted document; a document parsing unit, used to parse the formatted document to obtain a text element contained therein; a cutline element extraction unit, used to extract a cutline element from the text element; a correlativity detection unit, used to detect correlativity between the composite graph block and the cutline element; a correlativity storage unit, used to store the detected correlativity. The present invention also provides a method for logically processing a composite graph in a formatted document. According to the technical scheme disclosed in the present invention, it is easily achieve layout understanding of the composite graph in a graph-text mixed layout of the formatted document, so as to avoid a logical error.
摘要:
An extraction device for the composite graph in a fixed layout document comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and their types; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block.
摘要:
Provided is a table recognizing method, comprising: parsing and analyzing metadata information in an original fixed-layout document, and extracting basic elements on a page of the document; segmenting the basic elements, extracting segmented text lines on the page, and acquiring fragments; constructing an undirected graph with respect to each of the fragments; extracting an image on the page, detecting intersection points of horizontal lines and vertical lines, detecting an external bounding box of the intersection points, and taking whether the segmented text lines fall within the external bounding box as local relationship features; training a learning model according to the local relationship features, local features of the fragments, and neighborhood relationship features among the fragments, acquiring model parameters, and establishing a table recognizing model; and invoking the table recognizing model to perform table recognizing for the document, and acquiring a recognizing result.