Apparatus and method for extracting information from a formatted document
    1.
    发明申请
    Apparatus and method for extracting information from a formatted document 审中-公开
    从格式化文档中提取信息的装置和方法

    公开(公告)号:US20060143555A1

    公开(公告)日:2006-06-29

    申请号:US10768178

    申请日:2004-02-02

    IPC分类号: G06F17/21

    CPC分类号: G06F17/2745

    摘要: The present invention discloses an apparatus for extracting information from a formatted document, comprising: an input unit (1) for inputting a formatted document; a unit (2) for analyzing the input formatted document and saving the particular typographic information, a unit (3) for identifying special character strings on the basis of the analysis result by means of the typographic information such as font size, character font, color, etc.; a unit (4) for extracting the identified special character strings; and an output unit (5) for outputting the extracted character strings. When the typographic information of a certain character string is determined as a special typographic information, said character string is determined to be special character string. Thus, the present apparatus is able to automatically extract information from different types of format documents.

    摘要翻译: 本发明公开了一种从格式化文档中提取信息的装置,包括:输入单元(1),用于输入格式化的文档; 用于分析输入的格式化文档并保存特定印刷信息的单元(2),用于根据分析结果通过诸如字体大小,字符字体,颜色等印刷信息识别特殊字符串的单元(3) 等等 用于提取所识别的特殊字符串的单元(4) 以及用于输出提取的字符串的输出单元(5)。 当将特定字符串的印刷信息确定为特殊印刷信息时,所述字符串被确定为特殊字符串。 因此,本装置能够自动地从不同类型的格式文档中提取信息。