AUTOMATED CLASSIFICATION OF DOCUMENT PAGES
    2.
    发明公开
    AUTOMATED CLASSIFICATION OF DOCUMENT PAGES 审中-公开
    AUTOCATISIERTE KLASSIFIZIERUNG VON DOKUMENTSEITEN

    公开(公告)号:EP2069980A1

    公开(公告)日:2009-06-17

    申请号:EP07814568.7

    申请日:2007-08-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30707

    摘要: A system and method are disclosed for automatically classifying images of pages of a source, such as a book, into classifications such as front cover, copyright page, table of contents, text, index, etc. In one embodiment, three phases are provided in the classification process. During a first phase of the classification process, a first classifier may be used to determine a preliminary classification of a page image based on single- page criteria. During a second phase of the classification process, a second classifier may be used to determine a final classification for the page image based on multiple-page and/or global criteria. During an optional third phase of classification, a verifier may be used to verify the final classification of the page image based on verification criteria. If automatic classification fails, the page image may be passed on to a human operator for manual classification.

    摘要翻译: 公开了一种系统和方法,用于将诸如书籍的源的页面的图像自动分类成诸如前盖,版权页,目录,文本,索引等的分类。在一个实施例中,提供了三个阶段 分类过程。 在分类处理的第一阶段期间,可以使用第一分类器来基于单页标准来确定页面图像的初步分类。 在分类过程的第二阶段期间,第二分类器可以用于基于多页和/或全局标准来确定页面图像的最终分类。 在可选的第三阶段分类期间,验证者可以用于基于验证标准来验证页面图像的最终分类。 如果自动分类失败,则可以将页面图像传递给人类操作者进行手动分类。