SYSTEM FOR OPTIMAL DOCUMENT SCANNING
    1.
    发明申请
    SYSTEM FOR OPTIMAL DOCUMENT SCANNING 审中-公开
    最佳文件扫描系统

    公开(公告)号:US20090201541A1

    公开(公告)日:2009-08-13

    申请号:US12351302

    申请日:2009-01-09

    Abstract: A method of controlling a scanner to improve automatic recognition and classification of scanned physical documents for a document analysis system, which receives and processes jobs containing at least one electronic document from a plurality of users to automatically recognize and classify the job documents into document categories, is disclosed. The method comprises, using a scan control system, obtaining the capability of, and existing scanner settings for, the scanner upon receiving a command to initiate scanning of physical documents; saving the existing scanner settings of the scanner; automatically commanding the scanner to use new scanner settings, wherein the new scanner settings are selected in accordance with the capability of the recognition system; commanding the scanner to begin scanning operation with the new scanner settings; and automatically resetting the scanner settings of the scanner back to the saved existing scanner settings upon completing of the scanning operation.

    Abstract translation: 一种控制扫描器以提高文档分析系统的扫描物理文档的自动识别和分类的方法,该文档分析系统从多个用户接收并处理包含至少一个电子文档的作业,以将作业文档自动识别和分类为文档类别, 被披露。 该方法包括:在接收到启动扫描物理文件的命令时,使用扫描控制系统获得扫描仪的能力和扫描仪设置; 保存扫描仪的现有扫描仪设置; 自动命令扫描仪使用新的扫描仪设置,其中根据识别系统的能力选择新的扫描仪设置; 命令扫描仪使用新的扫描仪设置开始扫描操作; 并在完成扫描操作后自动将扫描仪的扫描仪设置重置回保存的现有扫描仪设置。

    Systems and methods for automatically processing electronic documents using multiple image transformation algorithms
    3.
    发明授权
    Systems and methods for automatically processing electronic documents using multiple image transformation algorithms 有权
    使用多个图像变换算法自动处理电子文档的系统和方法

    公开(公告)号:US08571317B2

    公开(公告)日:2013-10-29

    申请号:US13007452

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.

    Abstract translation: 在接收和处理来自多个用户的作业的文档分析系统中,每个作业可以包含多个电子文档,以从电子文档中提取数据;一种使用多个图像自动预处理每个接收到的电子文档的方法 提供了用于改进从所述文档提取后续数据的转换算法。 该方法包括:将每个接收的电子文档页面电子分割成片; 使用多个图像预处理算法中的每一个自动处理所接收的电子文档页面以产生每个片段的多个图像变体; 并且对于每个片段的图像变化分析后续处理和数据提取的输出,以从每个片段的多个输出中确定哪个输出最佳。

    SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS
    4.
    发明申请
    SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS 审中-公开
    用于培训文件分析系统的系统和方法,用于从文档自动提取数据

    公开(公告)号:US20110258150A1

    公开(公告)日:2011-10-20

    申请号:US13007430

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.

    Abstract translation: 提供了一种培训文档分析系统从文档中提取数据的方法。 该方法包括:自动分析从文档中提取的图像和文本特征,将文档与相应的文档类别相关联; 将所提取的文本特征与与文档的相应类别相关联的一组文本特征进行比较,其中该组文本特征包括一组字符,单词和短语; 如果发现所提取的特征由属于与相应文档类别相关联的文本特征集合的字符,单词和短语组成,则将所提取的文本特征存储为包含在相应文档中的数据; 并且如果所提取的文本特征被发现包括不属于与相应文档类别相关联的一组文本特征的至少一个文本特征,则将未被识别的文本特征提交到训练阶段。

    SYSTEMS AND METHODS FOR AUTOMATICALLY GROUPING ELECTRONIC DOCUMENT PAGES
    5.
    发明申请
    SYSTEMS AND METHODS FOR AUTOMATICALLY GROUPING ELECTRONIC DOCUMENT PAGES 审中-公开
    自动分类电子文件页的系统和方法

    公开(公告)号:US20110255790A1

    公开(公告)日:2011-10-20

    申请号:US13007481

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: A method of grouping electronic document pages of a job that belong together is provided. The method includes: automatically analyzing images and text features extracted from each received electronic document page to associate the electronic document page with a corresponding document category; automatically identifying features extracted from the electronic document page that potentially indicate to which document group the electronic document page belongs; comparing the identified features with a set of group identifying features associated with corresponding document group, in which the set of group identifying features includes at least a set of page numbers and account numbers; and, if the identified features are found to include a set of a page number and an account number belonging to the set of group identifying features associated with the corresponding document group, grouping the electronic document page into the corresponding document group.

    Abstract translation: 提供了归属于一起的作业的电子文档页面分组的方法。 该方法包括:自动分析从每个接收到的电子文档页面提取的图像和文本特征,以将电子文档页面与相应的文档类别相关联; 自动识别从电子文档页面提取的可能指示电子文档页面所属的文档组的特征; 将所识别的特征与与对应文档组相关联的一组组识别特征进行比较,其中所述组识别特征集合包括至少一组页码和帐号; 并且如果发现所识别的特征包括属于与相应文档组相关联的组识别特征的集合的页码和帐号的集合,则将电子文档页面分组到对应的文档组。

    Systems and methods for handling and distinguishing binarized, background artifacts in the vicinity of document text and image features indicative of a document category
    7.
    发明授权
    Systems and methods for handling and distinguishing binarized, background artifacts in the vicinity of document text and image features indicative of a document category 有权
    用于处理和区分文档文本附近的二进制化的背景文物和指示文档类别的图像特征的系统和方法

    公开(公告)号:US08538184B2

    公开(公告)日:2013-09-17

    申请号:US12266465

    申请日:2008-11-06

    CPC classification number: G06K9/00442 G06K9/6885

    Abstract: A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.

    Abstract translation: 提供了一种通过文档分析系统增强从多个用户接收的用于改善所接收的电子文档的自动识别和分类的方法。 对于所接收的电子文档的每个页面,该方法对该页面进行过滤以推断由原始灰度或彩色图像源文档二值化而产生的二值化背景伪像,其驻留在页面中的二值化文本和二值化图像特征附近, 使得二进制文本和二值化图像可以与二进制化的背景伪像区分开并从文档中提取。 然后,该方法使用经过滤的文档中提取的特征来自动识别和将文档分类为文档类别。

    Systems and methods for automatically reducing data search space and improving data extraction accuracy using known constraints in a layout of extracted data elements
    8.
    发明申请
    Systems and methods for automatically reducing data search space and improving data extraction accuracy using known constraints in a layout of extracted data elements 审中-公开
    用于在提取的数据元素的布局中使用已知约束自动减少数据搜索空间并提高数据提取精度的系统和方法

    公开(公告)号:US20110258195A1

    公开(公告)日:2011-10-20

    申请号:US13007407

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: A method of automatically narrowing data search space and improving accuracy of data extraction using known constraints in a layout of extracted data elements for classified documented is provided. The method includes: analyzing each document to classify it within a document category, each category having a corresponding set of expected layouts; analyzing each electronic document to automatically extract images and text features; automatically constructing a data structure including a layout of the extracted features and layout relationships amongst the extracted features, wherein each of the extracted features in the layout maintains a reference to neighboring features and wherein closely related features are merged to form a combined feature; automatically narrowing data search space by detecting and removing parts of the layout that are not associated with any data elements using the data structure; and automatically detecting data using the extracted feature layout and the layout relationships amongst the extracted features.

    Abstract translation: 提供了一种使用分类记录的提取的数据元素的布局中的已知约束自动缩小数据搜索空间并提高数据提取的准确性的方法。 该方法包括:分析每个文档以在文档类别内对其进行分类,每个类别具有相应的一组预期布局; 分析每个电子文档以自动提取图像和文字特征; 自动地构建包括提取的特征的布局和提取的特征之间的布局关系的数据结构,其中布局中的每个提取的特征维持对相邻特征的引用,并且其中紧密相关的特征被合并以形成组合的特征; 通过使用数据结构检测和去除与任何数据元素不相关联的布局部分来自动缩小数据搜索空间; 并使用所提取的特征布局和提取的特征之间的布局关系来自动检测数据。

    SYSTEMS AND METHODS FOR AUTOMATICALLY PROCESSING ELECTRONIC DOCUMENTS USING MULTIPLE IMAGE TRANSFORMATION ALGORITHMS
    9.
    发明申请
    SYSTEMS AND METHODS FOR AUTOMATICALLY PROCESSING ELECTRONIC DOCUMENTS USING MULTIPLE IMAGE TRANSFORMATION ALGORITHMS 有权
    使用多个图像变换算法自动处理电子文档的系统和方法

    公开(公告)号:US20110255782A1

    公开(公告)日:2011-10-20

    申请号:US13007452

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.

    Abstract translation: 在接收和处理来自多个用户的作业的文档分析系统中,每个作业可以包含多个电子文档,以从电子文档中提取数据;一种使用多个图像自动预处理每个接收到的电子文档的方法 提供了用于改进从所述文档提取后续数据的转换算法。 该方法包括:将每个接收的电子文档页面电子分割成片; 使用多个图像预处理算法中的每一个自动处理所接收的电子文档页面以产生每个片段的多个图像变体; 并且对于每个片段的图像变化分析后续处理和数据提取的输出,以从每个片段的多个输出中确定哪个输出最佳。

    SYSTEMS AND METHODS FOR HANDLING AND DISTINGUISHING BINARIZED, BACKGROUND ARTIFACTS IN THE VICINITY OF DOCUMENT TEXT AND IMAGE FEATURES INDICATIVE OF A DOCUMENT CATEGORY
    10.
    发明申请
    SYSTEMS AND METHODS FOR HANDLING AND DISTINGUISHING BINARIZED, BACKGROUND ARTIFACTS IN THE VICINITY OF DOCUMENT TEXT AND IMAGE FEATURES INDICATIVE OF A DOCUMENT CATEGORY 有权
    用于处理和排除混合的系统和方法,文献文本和图像特征的背景文献指出文献类别

    公开(公告)号:US20090119296A1

    公开(公告)日:2009-05-07

    申请号:US12266465

    申请日:2008-11-06

    CPC classification number: G06K9/00442 G06K9/6885

    Abstract: A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.

    Abstract translation: 提供了一种通过文档分析系统增强从多个用户接收的用于改善所接收的电子文档的自动识别和分类的方法。 对于所接收的电子文档的每个页面,该方法对该页面进行过滤以推断由原始灰度或彩色图像源文档二值化而产生的二值化背景伪像,其驻留在页面中的二值化文本和二值化图像特征附近, 使得二进制文本和二值化图像可以与二进制化的背景伪像区分开并从文档中提取。 然后,该方法使用经过滤的文档中提取的特征来自动识别和将文档分类为文档类别。

Patent Agency Ranking