System and method for identifying and labeling fields of text associated with scanned business documents
    1.
    发明授权
    System and method for identifying and labeling fields of text associated with scanned business documents 有权
    用于识别和标记与扫描的业务文档相关的文本字段的系统和方法

    公开(公告)号:US07860312B2

    公开(公告)日:2010-12-28

    申请号:US12710568

    申请日:2010-02-23

    IPC分类号: G06K9/34

    CPC分类号: G06K9/00469

    摘要: A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.

    摘要翻译: 用于从商业文档电子地蒸馏信息的系统使用网络扫描器来电子扫描其上具有业务文档的压板区域以创建位图。 网络服务器执行分割过程,将扫描生成的位图分割成位图对象,对应于扫描的业务文档的位图对象; 将位图对象转换为文本块的文本转换过程的位图; 语义识别过程,用于生成对应于扫描的业务单据的语义实体的结构化表示; 以及将结构化表示转换成结构文本文件的文档生成处理。 语义识别处理包括对于其中具有关键词的每行文本生成与其中的关键词对应的终端符号的处理; 生成对于其中没有关键字的每行文本和不存在数字字符的字母的终端符号; 为每个不具有关键字的文本行和其中具有数字字符的每行文本生成字母数字终端符号; 从所生成的终端符号生成一串终端符号; 确定所生成的终端符号串的可能解析; 根据确定的功能标记每个文本行,具有非终端符号; 以及基于每个文本行的非终端符号以及确定的所生成的终端符号串的可能解析,将业务文档信息文本解析为商业文档信息文本的字段。

    System and method for identifying and labeling fields of text associated with scanned business documents

    公开(公告)号:US07689037B2

    公开(公告)日:2010-03-30

    申请号:US10970930

    申请日:2004-10-22

    IPC分类号: G06K9/34

    CPC分类号: G06K9/00469

    摘要: A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.

    System and method for identifying and labeling fields of text associated with scanned business documents
    3.
    发明授权
    System and method for identifying and labeling fields of text associated with scanned business documents 有权
    用于识别和标记与扫描的业务文档相关的文本字段的系统和方法

    公开(公告)号:US07965891B2

    公开(公告)日:2011-06-21

    申请号:US12710573

    申请日:2010-02-23

    IPC分类号: G06K9/34

    CPC分类号: G06K9/00469

    摘要: A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.

    摘要翻译: 用于从商业文档电子地蒸馏信息的系统使用网络扫描器来电子扫描其上具有业务文档的压板区域以创建位图。 网络服务器执行分割过程,将扫描生成的位图分割成位图对象,对应于扫描的业务文档的位图对象; 将位图对象转换为文本块的文本转换过程的位图; 语义识别过程,用于生成对应于扫描的业务单据的语义实体的结构化表示; 以及将结构化表示转换成结构文本文件的文档生成处理。 语义识别处理包括对于其中具有关键词的每行文本生成与其中的关键词对应的终端符号的处理; 生成对于其中没有关键字的每行文本和不存在数字字符的字母的终端符号; 为每个不具有关键字的文本行和其中具有数字字符的每行文本生成字母数字终端符号; 从所生成的终端符号生成一串终端符号; 确定所生成的终端符号串的可能解析; 根据确定的功能标记每个文本行,具有非终端符号; 以及基于每个文本行的非终端符号以及确定的所生成的终端符号串的可能解析,将业务文档信息文本解析为商业文档信息文本的字段。

    SYSTEM AND METHOD FOR IDENTIFYING AND LABELING FIELDS OF TEXT ASSOCIATED WITH SCANNED BUSINESS DOCUMENTS
    4.
    发明申请
    SYSTEM AND METHOD FOR IDENTIFYING AND LABELING FIELDS OF TEXT ASSOCIATED WITH SCANNED BUSINESS DOCUMENTS 有权
    用于识别和标记与扫描业务文档相关联的文本字段的系统和方法

    公开(公告)号:US20100149606A1

    公开(公告)日:2010-06-17

    申请号:US12710573

    申请日:2010-02-23

    IPC分类号: G06F17/00 G06K9/34 H04N1/04

    CPC分类号: G06K9/00469

    摘要: A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.

    摘要翻译: 用于从商业文档电子地蒸馏信息的系统使用网络扫描器来电子扫描其上具有业务文档的压板区域以创建位图。 网络服务器执行分割过程,将扫描生成的位图分割成位图对象,对应于扫描的业务文档的位图对象; 将位图对象转换为文本块的文本转换过程的位图; 语义识别过程,用于生成对应于扫描的业务单据的语义实体的结构化表示; 以及将结构化表示转换成结构文本文件的文档生成处理。 语义识别处理包括对于其中具有关键词的每行文本生成与其中的关键词对应的终端符号的处理; 生成对于其中没有关键字的每行文本和不存在数字字符的字母的终端符号; 为每个不具有关键字的文本行和其中具有数字字符的每行文本生成字母数字终端符号; 从所生成的终端符号生成一串终端符号; 确定所生成的终端符号串的可能解析; 根据确定的功能标记每个文本行,具有非终端符号; 以及基于每个文本行的非终端符号以及确定的所生成的终端符号串的可能解析,将业务文档信息文本解析为商业文档信息文本的字段。

    SYSTEM AND METHOD FOR IDENTIFYING AND LABELING FIELDS OF TEXT ASSOCIATED WITH SCANNED BUSINESS DOCUMENTS

    公开(公告)号:US20100150397A1

    公开(公告)日:2010-06-17

    申请号:US12710568

    申请日:2010-02-23

    IPC分类号: G06K9/00 G06K9/34

    CPC分类号: G06K9/00469

    摘要: A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.

    System and method for identifying and labeling fields of text associated with scanned business documents
    6.
    发明申请
    System and method for identifying and labeling fields of text associated with scanned business documents 失效
    用于识别和标记与扫描的业务文档相关的文本字段的系统和方法

    公开(公告)号:US20060088214A1

    公开(公告)日:2006-04-27

    申请号:US10970930

    申请日:2004-10-22

    IPC分类号: G06K9/34 G06K9/20 G06F17/21

    CPC分类号: G06K9/00469

    摘要: A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.

    摘要翻译: 用于从商业文档电子地蒸馏信息的系统使用网络扫描器来电子扫描其上具有业务文档的压板区域以创建位图。 网络服务器执行分割过程,将扫描生成的位图分割成位图对象,对应于扫描的业务文档的位图对象; 将位图对象转换为文本块的文本转换过程的位图; 语义识别过程,用于生成对应于扫描的业务单据的语义实体的结构化表示; 以及将结构化表示转换成结构文本文件的文档生成处理。 语义识别处理包括对于其中具有关键词的每行文本生成与其中的关键词对应的终端符号的处理; 生成对于其中没有关键字的每行文本和不存在数字字符的字母的终端符号; 为每个不具有关键字的文本行和其中具有数字字符的每行文本生成字母数字终端符号; 从所生成的终端符号生成一串终端符号; 确定所生成的终端符号串的可能解析; 根据确定的功能标记每个文本行,具有非终端符号; 以及基于每个文本行的非终端符号以及确定的所生成的终端符号串的可能解析,将业务文档信息文本解析为商业文档信息文本的字段。