Domain-specific unstructured text retrieval

发明授权

US10318564B2 Domain-specific unstructured text retrieval 有权

请登陆查看更多内容

专利标题： Domain-specific unstructured text retrieval
申请号： US14867620

申请日： 2015-09-28
公开(公告)号： US10318564B2

公开(公告)日： 2019-06-11
发明人: Achraf Abdel Moneim Tawfik Chalabi , Eslam Kamal Abdel-Aal Abdel-Reheem , Sayed Hassan Sayed Abdelaziz , Yuval Yehezkel Marton , Michel Naim Naguib Gerguis
申请人： Microsoft Technology Licensing, LLC
申请人地址： US WA Redmond
专利权人： Microsoft Technology Licensing, LLC
当前专利权人： Microsoft Technology Licensing, LLC
当前专利权人地址： US WA Redmond
代理机构： Schwegmann Lundberg & Woessner, P.A.
主分类号： G06F17/30
IPC分类号： G06F17/30 ; G06F16/33 ; G06N20/00 ; G06F16/35 ; G06F16/951 ; G06F16/958

Domain-specific unstructured text retrieval

摘要：

Retrieving from the Internet unstructured text related to a specified domain is described. Training data is accessed; the training data comprises unstructured text related to the specified domain. A first classifier is trained using features of the training data. It is used to classify unstructured text having plurality of features, to obtain unstructured text examples related to the domain. The unstructured text examples are used to retrieve from the Internet similar examples which do not have at least some of the plurality of features. Optionally, a second classifier is trained using the similar examples. Additional unstructured text is retrieved from the Internet and the second classifier is used to label the additional unstructured text for domain relevance.

公开/授权文献

US20170091313A1 DOMAIN-SPECIFIC UNSTRUCTURED TEXT RETRIEVAL 公开/授权日：2017-03-30

信息查询

Espacenet