- 专利标题: Techniques for extracting contextually structured data from document images
-
申请号: US17083568申请日: 2020-10-29
-
公开(公告)号: US11049235B2公开(公告)日: 2021-06-29
- 发明人: David James Wheaton , William Robert Nadolski , Heather Michelle GoodyKoontz
- 申请人: SAS Institute Inc.
- 申请人地址: US NC Cary
- 专利权人: SAS Institute Inc.
- 当前专利权人: SAS Institute Inc.
- 当前专利权人地址: US NC Cary
- 代理机构: Kacvinsky Daisak Bluni PLLC
- 主分类号: G06K9/00
- IPC分类号: G06K9/00 ; G06T7/00 ; G06F16/81 ; G06F16/93 ; G06F40/284 ; G06F40/186 ; G06F40/169 ; G06K9/68 ; G06K9/62
摘要:
Embodiments are generally directed to techniques for extracting contextually structured data from document images, such as by automatically identifying document layout, document data, and/or document metadata in a document image, for instance. Many embodiments are particularly directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. For example, the document template database may include a plurality of templates for identifying/explaining key data elements in various document image formats that can be used to extract contextually structured data from incoming document images with a matching document image format. Several embodiments are particularly directed to automatically identifying and associating document metadata with corresponding document data in a document image, such as for generating a machine-facilitated annotation of the document image. In some embodiments, the machine-facilitated annotation of a document may be used to generate a template for the template database.
公开/授权文献
信息查询