System for searching a corpus of document images by user specified
document layout components
    1.
    发明授权
    System for searching a corpus of document images by user specified document layout components 失效
    用户通过用户指定的文档布局组件搜索文档图像的语料库的系统

    公开(公告)号:US5999664A

    公开(公告)日:1999-12-07

    申请号:US971022

    申请日:1997-11-14

    摘要: A document search system provides a user with a programming interface for dynamically specifying features of documents recorded in a corpus of documents. The programming interface operates at a high-level that is suitable for interactive user specification of layout components and structures of documents. In operation, a bitmap image of a document is analyzed by the document search system to identify layout objects such as text blocks or graphics. Subsequently, the document search system computes a set of attributes for each of the identified layout objects. The set of attributes which are identified are used to describe the layout structure of a page image of a document in terms of the spatial relations that layout objects have to frames of reference that are defined by other layout objects. After computing attributes for each layout object, a user can operate the programming interface to define unique document features. Each document feature is a routine defined by a sequence of selections operations which consume a first set of layout objects and produce a second set of layout objects. The second set of layout objects constitutes the feature in a page image of a document. Using the programming interface, a user flexibly defines a genre of document using the user-specified document features.

    摘要翻译: 文档搜索系统向用户提供用于动态地指定记录在文档语料库中的文档的特征的编程接口。 编程接口在适用于布局组件和文档结构的交互式用户规范的高级操作。 在操作中,由文档搜索系统分析文档的位图图像以识别诸如文本块或图形的布局对象。 随后,文档搜索系统计算每个识别的布局对象的一组属性。 所标识的属性集合用于描述文档的页面图像的布局结构,其方式是布局对象具有由其他布局对象定义的引用框架的空间关系。 在计算每个布局对象的属性之后,用户可以操作编程接口来定义唯一的文档特征。 每个文档特征是由消耗第一组布局对象并产生第二组布局对象的选择操作序列定义的例程。 第二组布局对象构成文档的页面图像中的特征。 使用编程界面,用户可以使用用户指定的文档功能灵活定义文档类型。