Extraction of information from documents

发明授权

US07469251B2 Extraction of information from documents 有权

标题翻译：从文件中提取信息

请登陆查看更多内容

专利标题： Extraction of information from documents
专利标题（中）： 从文件中提取信息
申请号： US11192687

申请日： 2005-07-29
公开(公告)号： US07469251B2

公开(公告)日： 2008-12-23
发明人: Hang Li , Ruihua Song , Yunbo Cao , Dmitriy Meyerzon
申请人： Hang Li , Ruihua Song , Yunbo Cao , Dmitriy Meyerzon
申请人地址： US WA Redmond
专利权人： Microsoft Corporation
当前专利权人： Microsoft Corporation
当前专利权人地址： US WA Redmond
代理机构： Westman, Champlin & Kelly, P.A.
主分类号： G06F17/30
IPC分类号： G06F17/30

Extraction of information from documents

摘要：

An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.

摘要（中）：

对标示的培训文件中标识的格式特征进行信息提取模型的培训。通过根据文档中单位的格式特征为单位分配标签来提取文档中的信息。识别开始标签和结束标签，并在开始标签和结束标签之间提取信息。提取的信息可以用于各种文档处理任务，如排名。

公开/授权文献

US20060277173A1 Extraction of information from documents 公开/授权日：2006-12-07

信息查询

Espacenet