System and method for forms classification by line-art alignment
    1.
    发明授权
    System and method for forms classification by line-art alignment 有权
    通过线条对齐形式分类的系统和方法

    公开(公告)号:US08792715B2

    公开(公告)日:2014-07-29

    申请号:US13539941

    申请日:2012-07-02

    IPC分类号: G06K9/00

    CPC分类号: G06K9/00449

    摘要: A system and method to classify forms. An image representing a form of an unknown document type is received. The image includes line-art. Further, a plurality of template models corresponding to a plurality of different document types is received. The plurality of different document types is intended to include the correct document type of the unknown document. A subset of the plurality of template models are selected as candidate template models. The candidate template models include line-art junctions best matching line-art junctions of the received image. One of the candidate template models is selected as a best candidate template model. The best candidate template model includes horizontal and vertical lines best matching horizontal and vertical lines of the received image, respectively, aligned to the best candidate template model.

    摘要翻译: 一种用于分类表单的系统和方法。 接收到表示未知文档类型的形式的图像。 图像包括线条艺术。 此外,接收对应于多个不同文档类型的多个模板模型。 多个不同的文档类型旨在包括未知文档的正确文档类型。 选择多个模板模型的子集作为候选模板模型。 候选模板模型包括最佳匹配接收图像的线艺术结的线艺术结。 选择候选模板模型之一作为最佳候选模板模型。 最佳候选模板模型包括分别与最佳候选模板模型对齐的最佳匹配接收图像的水平和垂直线的水平和垂直线。

    SYSTEM AND METHOD FOR FORMS CLASSIFICATION BY LINE-ART ALIGNMENT
    2.
    发明申请
    SYSTEM AND METHOD FOR FORMS CLASSIFICATION BY LINE-ART ALIGNMENT 有权
    用于通过线条对齐进行分类的系统和方法

    公开(公告)号:US20140003717A1

    公开(公告)日:2014-01-02

    申请号:US13539941

    申请日:2012-07-02

    IPC分类号: G06K9/68 G06K9/46

    CPC分类号: G06K9/00449

    摘要: A system and method to classify forms. An image representing a form of an unknown document type is received. The image includes line-art. Further, a plurality of template models corresponding to a plurality of different document types is received. The plurality of different document types is intended to include the correct document type of the unknown document. A subset of the plurality of template models are selected as candidate template models. The candidate template models include line-art junctions best matching line-art junctions of the received image. One of the candidate template models is selected as a best candidate template model. The best candidate template model includes horizontal and vertical lines best matching horizontal and vertical lines of the received image, respectively, aligned to the best candidate template model.

    摘要翻译: 一种用于分类表单的系统和方法。 接收到表示未知文档类型的形式的图像。 图像包括线条艺术。 此外,接收对应于多个不同文档类型的多个模板模型。 多个不同的文档类型旨在包括未知文档的正确文档类型。 选择多个模板模型的子集作为候选模板模型。 候选模板模型包括最佳匹配接收图像的线艺术结的线艺术结。 选择候选模板模型之一作为最佳候选模板模型。 最佳候选模板模型包括分别与最佳候选模板模型对齐的最佳匹配接收图像的水平和垂直线的水平和垂直线。

    System and method for forms recognition by synthesizing corrected localization of data fields
    5.
    发明授权
    System and method for forms recognition by synthesizing corrected localization of data fields 有权
    通过合成数据字段的校正定位来进行表单识别的系统和方法

    公开(公告)号:US09536141B2

    公开(公告)日:2017-01-03

    申请号:US13537729

    申请日:2012-06-29

    申请人: Eric Saund

    发明人: Eric Saund

    IPC分类号: G06F17/00 G06K9/00 G06F17/24

    摘要: A method and system generates an idealized image of a form. An image of a form and a template model of the form are received. The form includes data fields. Word boxes of the image are identified. The word boxes are assigned to corresponding data fields of the form. An idealized image of the from is generated based on the assignments and the template model.

    摘要翻译: 一种方法和系统产生一个形式的理想化图像。 接收表单的图像和表单的模板模型。 表单包括数据字段。 识别图像的字框。 单词框被分配给表单的相应数据字段。 基于分配和模板模型生成来自的理想化图像。

    Method for generating a graph lattice from a corpus of one or more data graphs

    公开(公告)号:US08872828B2

    公开(公告)日:2014-10-28

    申请号:US12883464

    申请日:2010-09-16

    申请人: Eric Saund

    发明人: Eric Saund

    IPC分类号: G06T17/20 G06T11/20

    CPC分类号: G06T11/206

    摘要: A document recognition system and method, where images are represented as a collection of primitive features whose spatial relations are represented as a graph. Useful subsets of all the possible subgraphs representing different portions of images are represented over a corpus of many images. The data structure is a lattice of subgraphs, and algorithms are provided means to build and use the graph lattice efficiently and effectively.

    System and method for localizing data fields on structured and semi-structured forms
    8.
    发明授权
    System and method for localizing data fields on structured and semi-structured forms 有权
    用于本地化结构化和半结构化形式的数据字段的系统和方法

    公开(公告)号:US08781229B2

    公开(公告)日:2014-07-15

    申请号:US13537630

    申请日:2012-06-29

    申请人: Eric Saund

    发明人: Eric Saund

    IPC分类号: G06K9/34

    摘要: A method and system to localize data fields of a form. An image of a form is received, where the form includes data fields. Word boxes of the image are identified. The word boxes are grouped into candidate zones, where each of the candidate zones includes one or more of the word boxes. Hypotheses are formed from the data fields and the candidate zones, where each hypothesis assigns one of the candidate zones to one of the data fields or a null data field. A constrained optimization search of the hypotheses is performed for an optimal set of hypotheses. The optimal set of hypotheses assigns word box groups to corresponding data fields.

    摘要翻译: 本地化表单数据字段的方法和系统。 收到表单的图像,其中表单包括数据字段。 识别图像的字框。 单词框被分组成候选区域,其中每个候选区域包括一个或多个单词框。 假设从数据字段和候选区域形成,其中每个假设将一个候选区域分配给数据字段之一或空数据字段。 对于最优假设集执行假设的约束优化搜索。 最佳假设集合将字框组分配给相应的数据字段。

    Graph lattice method for image clustering, classification, and repeated structure finding
    9.
    发明授权
    Graph lattice method for image clustering, classification, and repeated structure finding 有权
    用于图像聚类,分类和重复结构查找的图形格子方法

    公开(公告)号:US08724911B2

    公开(公告)日:2014-05-13

    申请号:US12883503

    申请日:2010-09-16

    申请人: Eric Saund

    发明人: Eric Saund

    CPC分类号: G06K9/6892 G06K9/00449

    摘要: A document recognition system and method, where images are represented as a collection of primitive features whose spatial relations are represented as a graph. Useful subsets of all the possible subgraphs representing different portions of images are represented over a corpus of many images. The data structure is a lattice of subgraphs, and algorithms are provided means to build and use the graph lattice efficiently and effectively.

    摘要翻译: 一种文档识别系统和方法,其中图像被表示为其空间关系被表示为图形的原始特征的集合。 表示图像的不同部分的所有可能子图的有用子集在许多图像的语料库上表示。 数据结构是子图的格子,提供了有效和高效地构建和使用图形格子的算法。

    SELECTIVE LEARNING FOR GROWING A GRAPH LATTICE
    10.
    发明申请
    SELECTIVE LEARNING FOR GROWING A GRAPH LATTICE 有权
    选择学习用于生成图形格式

    公开(公告)号:US20130335422A1

    公开(公告)日:2013-12-19

    申请号:US13527071

    申请日:2012-06-19

    申请人: Eric Saund

    发明人: Eric Saund

    IPC分类号: G06T11/20

    CPC分类号: G06T11/206 G06K9/00

    摘要: A system and method generate a graph lattice from exemplary images. At least one processor receives exemplary data graphs of the exemplary images and generates graph lattice nodes of size one from primitives. Until a termination condition is met, the at least one processor repeatedly: 1) generates candidate graph lattice nodes from accepted graph lattice nodes; 2) selects one or more candidate graph lattice nodes preferentially discriminating exemplary data graphs which are less discriminable than other exemplary data graphs using the accepted graph lattice nodes; and 3) promotes the selected graph lattice nodes to accepted status. The graph lattice is formed from the accepted graph lattice nodes and relations between the accepted graph lattice nodes.

    摘要翻译: 系统和方法从示例性图像生成图形点阵。 至少一个处理器接收示例性图像的示例性数据图,并从图元生成大小为1的图形格子节点。 在满足终止条件之前,所述至少一个处理器重复:1)从接受的图形格子节点生成候选图格点阵; 2)选择一个或多个候选图形格子节点优先区分使用所接受的图形格子节点而不比其他示例性数据图可辨别的示例性数据图; 和3)促进所选择的图形点阵节点接受状态。 图形格子由公认的图形点阵节点和接受的图形点阵节点之间的关系形成。