SYSTEMS AND METHODS FOR CLASSIFYING ELECTRONIC DOCUMENTS BY EXTRACTING AND RECOGNIZING TEXT AND IMAGE FEATURES INDICATIVE OF DOCUMENT CATEGORIES
    11.
    发明申请
    SYSTEMS AND METHODS FOR CLASSIFYING ELECTRONIC DOCUMENTS BY EXTRACTING AND RECOGNIZING TEXT AND IMAGE FEATURES INDICATIVE OF DOCUMENT CATEGORIES 审中-公开
    通过提取和识别文本和图像特征来分类电子文档的系统和方法指示文件类别

    公开(公告)号:US20090116757A1

    公开(公告)日:2009-05-07

    申请号:US12266472

    申请日:2008-11-06

    CPC classification number: G06K9/00442 G06K9/6885

    Abstract: A method in a document analysis system automatically extracts from each received electronic document image and text features, in which the image features are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category, next compares the extracted image and text features with feature sets associated with each document category, and then classifies each document to a document category, the feature set of which best matches the extracted features of the document.

    Abstract translation: 文档分析系统中的方法自动从每个接收到的电子文档图像和文本特征中提取,其中图像特征指示文档如何布局或文本组织,并且因此指示对应的文档类别,接下来将提取的 图像和文本功能与每个文档类别相关联的功能集,然后将每个文档分类到文档类别,其功能集与文档的提取的功能最匹配。

    Content collection
    12.
    发明授权
    Content collection 有权
    内容收集

    公开(公告)号:US07143193B1

    公开(公告)日:2006-11-28

    申请号:US09532483

    申请日:1999-12-13

    Abstract: In a web service system with one or more web servers, a system and method for distributing content directly from each web server to a single computer transfers files generated on web servers to a central location for access by a system operator. If files generated by multiple web servers are aggregated on a single computer, processing and analysis can be performed on all of the files. Generally, in one aspect, the invention relates to a system and method for transmitting content from one computer to another in a web service system. The web service system includes web servers that provide web pages in response to web page requests. First and second web server agents provide an interface between the web service system and first and second computers, respectively. The first web server agent runs on the first computer and identifies at least a portion of a file for transmission to the second web server agent running on the second computer in the web service system. At least a portion of the file from the first web server agent is transmitted to the second web server agent and then stored by the second web server agent.

    Abstract translation: 在具有一个或多个web服务器的Web服务系统中,用于将内容直接从每个web服务器分发到单个计算机的系统和方法将在web服务器上生成的文件传送到中央位置以供系统操作者访问。 如果由多个Web服务器生成的文件聚合在一台计算机上,则可以对所有文件执行处理和分析。 通常,一方面,本发明涉及一种用于在Web服务系统中将内容从一台计算机传送到另一台计算机的系统和方法。 Web服务系统包括提供网页以响应网页请求的web服务器。 第一和第二Web服务器代理分别在Web服务系统和第一和第二计算机之间提供接口。 第一网络服务器代理在第一计算机上运行并且识别文件的至少一部分以传送到在web服务系统中在第二计算机上运行的第二web服务器代理。 来自第一web服务器代理的文件的至少一部分被发送到第二web服务器代理,然后被第二web服务器代理存储。

    Systems and methods for automatically processing electronic documents
    15.
    发明授权
    Systems and methods for automatically processing electronic documents 有权
    自动处理电子文件的系统和方法

    公开(公告)号:US08897563B1

    公开(公告)日:2014-11-25

    申请号:US14064935

    申请日:2013-10-28

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.

    Abstract translation: 在接收和处理来自多个用户的作业的文档分析系统中,每个作业可以包含多个电子文档,以从电子文档中提取数据;一种使用多个图像自动预处理每个接收到的电子文档的方法 提供了用于改进从所述文档提取后续数据的转换算法。 该方法包括:将每个接收的电子文档页面电子分割成片; 使用多个图像预处理算法中的每一个自动处理所接收的电子文档页面以产生每个片段的多个图像变体; 并且对于每个片段的图像变化分析后续处理和数据提取的输出,以从每个片段的多个输出中确定哪个输出最佳。

    SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENT PAGE INCLUDING MULTIPLE COPIES OF A FORM
    16.
    发明申请
    SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENT PAGE INCLUDING MULTIPLE COPIES OF A FORM 审中-公开
    从电子文档页面自动提取数据的系统和方法,包括一个表单的多个复制

    公开(公告)号:US20110258182A1

    公开(公告)日:2011-10-20

    申请号:US13007330

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of extracting data from a received electronic document page that includes multiple copies of a form is provided. The method comprising: automatically processing a received electronic document page that includes multiple copies of a form to group the multiple copies into corresponding number of records; automatically extracting data from each of the multiple copies of the form and saving the extracted data into the corresponding record; automatically comparing the extracted data in the records to determine which copy of the extracted data to select; if all extracted data instances are identical, assigning a high confidence score to the extracted data; and, if all extracted data instances are not identical, flagging the extracted data for a further processing.

    Abstract translation: 在从多个用户接收和处理作业的文档分析系统中,每个作业可以包含多个电子文档,从电子文档中提取数据,从接收到的电子文档页面提取数据的方法包括多个副本 提供一种表格。 该方法包括:自动处理接收到的电子文档页面,该电子文档页面包括多个副本,以将多个副本分组成相应数量的记录; 自动从表单的多个副本中提取数据,并将提取的数据保存到相应的记录中; 自动比较记录中提取的数据,以确定提取的数据的哪个副本进行选择; 如果所有提取的数据实例是相同的,则向提取的数据分配高置信度分数; 并且如果所有提取的数据实例不相同,则标记提取的数据以进行进一步处理。

    Systems and methods for automatically correcting data extracted from electronic documents using known constraints for semantics of extracted data elements
    17.
    发明申请
    Systems and methods for automatically correcting data extracted from electronic documents using known constraints for semantics of extracted data elements 审中-公开
    用于使用已提取的数据元素的语义的已知约束自动校正从电子文档提取的数据的系统和方法

    公开(公告)号:US20110258170A1

    公开(公告)日:2011-10-20

    申请号:US13007399

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically correcting the extracted data using known constraints amongst semantics of extracted data elements is provided. The method includes: analyzing each electronic document in a job to automatically extract data; automatically analyzing the extracted data to identify incorrectly extracted data elements using rules defining constraints amongst semantics of extracted data elements; and automatically attempting to correct the incorrectly extracted data elements using the rules.

    Abstract translation: 在从多个用户接收和处理作业的文档分析系统中,其中每个作业可以包含多个电子文档,以从电子文档中提取数据;使用提取的数据的语义之间的已知约束自动校正所提取的数据的方法 提供元素。 该方法包括:分析作业中的每个电子文档以自动提取数据; 自动分析所提取的数据以使用规定所提取的数据元素的语义之间的约束的规则来识别不正确地提取的数据元素; 并自动尝试使用规则纠正提取不正确的数据元素。

    SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELETRONIC DOCUMENTS USING MULTIPLE CHARACTER RECOGNITION ENGINES
    19.
    发明申请
    SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELETRONIC DOCUMENTS USING MULTIPLE CHARACTER RECOGNITION ENGINES 审中-公开
    使用多个字符识别引擎从ELETRONIC文件自动提取数据的系统和方法

    公开(公告)号:US20110255784A1

    公开(公告)日:2011-10-20

    申请号:US13007434

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document using a plurality of character recognition engines is provided. The method includes: automatically processing each received electronic document page using each of a plurality of recognition engines to extract data; comparing quality of data extracted from each of the recognition engines to assign a confidence score to the extracted data; and selecting extracted data having highest confidence score as the correct extracted data.

    Abstract translation: 在从每个作业可以包含多个电子文档的多个用户接收和处理作业的文档分析系统中,从电子文档中提取数据的方法,使用多个字符从每个接收到的电子文档中自动提取数据的方法 提供识别引擎。 该方法包括:使用多个识别引擎中的每一个自动处理所接收的电子文档页面以提取数据; 比较从每个识别引擎提取的数据的质量,以向所提取的数据分配置信度分数; 并选择具有最高置信度得分的提取数据作为正确的提取数据。

    SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS INCLUDING TABLES
    20.
    发明申请
    SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS INCLUDING TABLES 审中-公开
    从包括表格的电子文件自动提取数据的系统和方法

    公开(公告)号:US20110249905A1

    公开(公告)日:2011-10-13

    申请号:US13166966

    申请日:2011-06-23

    CPC classification number: G06K9/00449

    Abstract: A method of automatically extracting data from an electronic document including tables is provided. The method includes: automatically identifying rows of the table using gaps in horizontal projections of the plurality of image sections, wherein at least some of the identified rows in close proximity are collected to form table formations; and automatically identifying columns of the table using at least some of the plurality of image sections that are vertically aligned, wherein the identified columns are grown in each of the table formations using gaps in vertical projections of the plurality of image sections until an obstruction is reached. The method further includes automatically identifying labels in the plurality of corresponding image sections to associate the identified labels with at least one of the identified columns and the identified rows; and automatically extracting data from cells of the table formed by the identified rows and columns.

    Abstract translation: 提供了一种从包括表格的电子文档自动提取数据的方法。 该方法包括:使用多个图像部分中的水平投影中的间隙来自动识别该表的行,其中收集紧邻的至少一些所识别的行以形成表格; 以及使用垂直对准的所述多个图像部分中的至少一些自动识别所述表的列,其中使用所述多个图像部分的垂直投影中的间隙在所述表格中生长所识别的列,直到达到障碍物 。 所述方法还包括自动识别所述多​​个对应图像部分中的标签,以将所识别的标签与所识别的列和所识别的行中的至少一个相关联; 并自动从由所识别的行和列形成的表的单元格中提取数据。

Patent Agency Ranking