发明授权
- 专利标题: Extracting data
- 专利标题(中): 提取数据
-
申请号: US12900133申请日: 2010-10-07
-
公开(公告)号: US08239349B2公开(公告)日: 2012-08-07
- 发明人: Maria G. Castellanos , Miguel Durazo , Umeshwar Dayal
- 申请人: Maria G. Castellanos , Miguel Durazo , Umeshwar Dayal
- 申请人地址: US TX Houston
- 专利权人: Hewlett-Packard Development Company, L.P.
- 当前专利权人: Hewlett-Packard Development Company, L.P.
- 当前专利权人地址: US TX Houston
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Information can be extracted from unstructured documents using embodiments described herein. An entity recognition may be performed on an unstructured document and found entities may be annotated. Annotating includes inserting tags around the found entities to generate marked entities. A rule is applied to each of the marked entities in the unstructured document to generate a confidence value for every marked entity, wherein the rule comprises a plurality of prefixes for a target entity and a plurality of suffixes for the target entity. A marked entity with the highest confidence value is selected as an extraction target.
公开/授权文献
- US20120089620A1 EXTRACTING DATA 公开/授权日:2012-04-12
信息查询