发明授权
- 专利标题: Extraction of information from documents
- 专利标题(中): 从文件中提取信息
-
申请号: US11192687申请日: 2005-07-29
-
公开(公告)号: US07469251B2公开(公告)日: 2008-12-23
- 发明人: Hang Li , Ruihua Song , Yunbo Cao , Dmitriy Meyerzon
- 申请人: Hang Li , Ruihua Song , Yunbo Cao , Dmitriy Meyerzon
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 代理机构: Westman, Champlin & Kelly, P.A.
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.
公开/授权文献
- US20060277173A1 Extraction of information from documents 公开/授权日:2006-12-07
信息查询