Processing an electronic document for information extraction
    13.
    发明申请
    Processing an electronic document for information extraction 有权
    处理电子文件进行信息提取

    公开(公告)号:US20050125402A1

    公开(公告)日:2005-06-09

    申请号:US10835215

    申请日:2004-04-29

    IPC分类号: G06F17/30

    摘要: The present invention relates generally to automatically processing electronic documents. In one aspect, features and/or properties of words are identified from a set of training documents to aid in extracting information from documents to be processed. The features and/or properties relate to text of the words, position of the words and the relationship to other words. A classifier is developed to express these features and/or properties. During information extraction, documents are processed and analyzed based on the classifier and information is extracted based on correspondence of the documents and the features/properties expressed by the classifier.

    摘要翻译: 本发明一般涉及电子文件的自动处理。 在一个方面,从一组训练文件中识别词的特征和/或属性,以帮助从要处理的文档中提取信息。 特征和/或属性与单词的文本,单词的位置和与其他单词的关系相关。 开发分类器来表达这些特征和/或属性。 在信息提取过程中,基于分类器处理和分析文档,并根据文档的对应关系和分类器表示的特征/属性提取信息。

    SEGMENTED LAYERED IMAGE SYSTEM
    20.
    发明申请
    SEGMENTED LAYERED IMAGE SYSTEM 有权
    SEGMENTED层状图像系统

    公开(公告)号:US20070025622A1

    公开(公告)日:2007-02-01

    申请号:US11465087

    申请日:2006-08-16

    IPC分类号: G06K9/36

    CPC分类号: H04N1/403 G06K9/00456

    摘要: Systems and methods for encoding and decoding document images are disclosed. Document images are segmented into multiple layers according to a mask. The multiple layers are non-binary. The respective layers can then be processed and compressed separately in order to achieve better compression of the document image overall. A mask is generated from a document image. The mask is generated so as to reduce an estimate of compression for the combined size of the mask and multiple layers of the document image. The mask is then employed to segment the document image into the multiple layers. The mask determines or allocates pixels of the document image into respective layers. The mask and the multiple layers are processed and encoded separately so as to improve compression of the document image overall and to improve the speed of so doing. The multiple layers are non-binary images and can, for example, comprise a foreground image and a background image.

    摘要翻译: 公开了用于编码和解码文档图像的系统和方法。 根据掩码将文档图像分割成多个图层。 多层是非二进制的。 然后可以分别对各个层进行处理和压缩,以便对整个文件图像实现更好的压缩。 从文档图像生成蒙版。 生成掩模,以减少对于掩模和文档图像的多个层的组合大小的压缩估计。 然后使用掩模将文档图像分割成多个层。 掩模将文档图像的像素确定或分配到各个图层中。 掩模和多层被单独处理和编码,以便整体上改善文档图像的压缩并提高这样做的速度。 多层是非二进制图像,并且可以例如包括前景图像和背景图像。