Apparatus and a method for logically processing a composite graph in a formatted document

    公开(公告)号:US09542362B2

    公开(公告)日:2017-01-10

    申请号:US14095682

    申请日:2013-12-03

    摘要: The present invention provides an apparatus for logically processing a composite graph in a formatted document, the apparatus comprising: a composite graph block extraction unit, used to extract a composite graph block in the formatted document; a document parsing unit, used to parse the formatted document to obtain a text element contained therein; a cutline element extraction unit, used to extract a cutline element from the text element; a correlativity detection unit, used to detect correlativity between the composite graph block and the cutline element; a correlativity storage unit, used to store the detected correlativity. The present invention also provides a method for logically processing a composite graph in a formatted document. According to the technical scheme disclosed in the present invention, it is easily achieve layout understanding of the composite graph in a graph-text mixed layout of the formatted document, so as to avoid a logical error.

    Apparatus And A Method For Logically Processing A Composite Graph In A Formatted Document
    3.
    发明申请
    Apparatus And A Method For Logically Processing A Composite Graph In A Formatted Document 有权
    用于逻辑处理格式化文档中的复合图形的装置和方法

    公开(公告)号:US20140337719A1

    公开(公告)日:2014-11-13

    申请号:US14095682

    申请日:2013-12-03

    IPC分类号: G06F3/0484 G06F17/27

    摘要: The present invention provides an apparatus for logically processing a composite graph in a formatted document, the apparatus comprising: a composite graph block extraction unit, used to extract a composite graph block in the formatted document; a document parsing unit, used to parse the formatted document to obtain a text element contained therein; a cutline element extraction unit, used to extract a cutline element from the text element; a correlativity detection unit, used to detect correlativity between the composite graph block and the cutline element; a correlativity storage unit, used to store the detected correlativity. The present invention also provides a method for logically processing a composite graph in a formatted document. According to the technical scheme disclosed in the present invention, it is easily achieve layout understanding of the composite graph in a graph-text mixed layout of the formatted document, so as to avoid a logical error.

    摘要翻译: 本发明提供一种用于逻辑处理格式化文档中的复合图形的装置,该装置包括:复合图块块提取单元,用于提取格式化文档中的复合图块; 文档解析单元,用于解析格式化的文档以获得其中包含的文本元素; 切割元素提取单元,用于从文本元素提取切割元素; 相关性检测单元,用于检测复合图形块和切割线元素之间的相关性; 相关性存储单元,用于存储检测到的相关性。 本发明还提供了一种在格式化文档中逻辑地处理复合图形的方法。 根据本发明公开的技术方案,可以容易地在格式化文档的图形文本混合布局中实现组合图的布局理解,以避免逻辑错误。

    Apparatus and a method for logically processing a composite graph in a formatted document
    4.
    发明授权
    Apparatus and a method for logically processing a composite graph in a formatted document 有权
    用于逻辑处理格式化文档中的复合图的装置和方法

    公开(公告)号:US09569407B2

    公开(公告)日:2017-02-14

    申请号:US14095682

    申请日:2013-12-03

    摘要: The present invention provides an apparatus for logically processing a composite graph in a formatted document, the apparatus comprising: a composite graph block extraction unit, used to extract a composite graph block in the formatted document; a document parsing unit, used to parse the formatted document to obtain a text element contained therein; a cutline element extraction unit, used to extract a cutline element from the text element; a correlativity detection unit, used to detect correlativity between the composite graph block and the cutline element; a correlativity storage unit, used to store the detected correlativity. The present invention also provides a method for logically processing a composite graph in a formatted document. According to the technical scheme disclosed in the present invention, it is easily achieve layout understanding of the composite graph in a graph-text mixed layout of the formatted document, so as to avoid a logical error.

    摘要翻译: 本发明提供一种用于逻辑处理格式化文档中的复合图形的装置,该装置包括:复合图块块提取单元,用于提取格式化文档中的复合图块; 文档解析单元,用于解析格式化的文档以获得其中包含的文本元素; 切割元素提取单元,用于从文本元素提取切割元素; 相关性检测单元,用于检测复合图形块和切割线元素之间的相关性; 相关性存储单元,用于存储检测到的相关性。 本发明还提供了一种在格式化文档中逻辑地处理复合图形的方法。 根据本发明公开的技术方案,可以容易地在格式化文档的图形文本混合布局中实现组合图的布局理解,以避免逻辑错误。

    EXTRACTION DEVICE FOR COMPOSITE GRAPH IN FIXED LAYOUT DOCUMENT AND EXTRACTION METHOD THEREOF
    5.
    发明申请
    EXTRACTION DEVICE FOR COMPOSITE GRAPH IN FIXED LAYOUT DOCUMENT AND EXTRACTION METHOD THEREOF 审中-公开
    固定布置文件中复合图的提取装置及其提取方法

    公开(公告)号:US20150046784A1

    公开(公告)日:2015-02-12

    申请号:US14104064

    申请日:2013-12-12

    IPC分类号: G06F17/21

    CPC分类号: G06K9/00463

    摘要: An extraction device for the composite graph in a fixed layout document comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and their types; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block.

    摘要翻译: 一种用于固定布局文档中的复合图形的提取装置,包括:文档解析单元,用于解析固定布局文档,以及确定固定布局文档及其类型的图元; 层生成单元,用于提取文本图元以形成文本层,并使用其余的非文本图元来形成非文本层; 页面分析单元,用于分别用页面分析处理文本层和非文本层; 块生成单元,用于在文本层中生成文本块和非文本层中的图块; 相关块确定单元,用于确定与每个图形块相关联的文本块,并将所述相关文本块和图形块合并到合成图形块中; 标识符存储单元,用于存储复合图形块中包含的所有原语的标识符。

    Table recognizing method and table recognizing system
    6.
    发明授权
    Table recognizing method and table recognizing system 有权
    表识别方法和表识别系统

    公开(公告)号:US09268999B2

    公开(公告)日:2016-02-23

    申请号:US14096532

    申请日:2013-12-04

    IPC分类号: G06K9/62 G06K9/00

    CPC分类号: G06K9/00449 G06K9/00463

    摘要: Provided is a table recognizing method, comprising: parsing and analyzing metadata information in an original fixed-layout document, and extracting basic elements on a page of the document; segmenting the basic elements, extracting segmented text lines on the page, and acquiring fragments; constructing an undirected graph with respect to each of the fragments; extracting an image on the page, detecting intersection points of horizontal lines and vertical lines, detecting an external bounding box of the intersection points, and taking whether the segmented text lines fall within the external bounding box as local relationship features; training a learning model according to the local relationship features, local features of the fragments, and neighborhood relationship features among the fragments, acquiring model parameters, and establishing a table recognizing model; and invoking the table recognizing model to perform table recognizing for the document, and acquiring a recognizing result.

    摘要翻译: 提供了一种表识别方法,包括:解析和分析原始固定布局文档中的元数据信息,以及提取文档页面上的基本元素; 分割基本元素,在页面上提取分割的文本行,并获取片段; 构建关于每个片段的无向图; 提取页面上的图像,检测水平线和垂直线的交点,检测交点的外部边界框,以及分割的文本行是否落在外部边框内作为局部关系特征; 根据局部关系特征,片段的局部特征,片段间的邻域关系特征,获取模型参数,建立表识别模型,训练学习模型; 并调用表识别模型来执行文档的表识别,并获取识别结果。