- 专利标题: Domain-specific unstructured text retrieval
-
申请号: US14867620申请日: 2015-09-28
-
公开(公告)号: US10318564B2公开(公告)日: 2019-06-11
- 发明人: Achraf Abdel Moneim Tawfik Chalabi , Eslam Kamal Abdel-Aal Abdel-Reheem , Sayed Hassan Sayed Abdelaziz , Yuval Yehezkel Marton , Michel Naim Naguib Gerguis
- 申请人: Microsoft Technology Licensing, LLC
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人地址: US WA Redmond
- 代理机构: Schwegmann Lundberg & Woessner, P.A.
- 主分类号: G06F17/30
- IPC分类号: G06F17/30 ; G06F16/33 ; G06N20/00 ; G06F16/35 ; G06F16/951 ; G06F16/958
摘要:
Retrieving from the Internet unstructured text related to a specified domain is described. Training data is accessed; the training data comprises unstructured text related to the specified domain. A first classifier is trained using features of the training data. It is used to classify unstructured text having plurality of features, to obtain unstructured text examples related to the domain. The unstructured text examples are used to retrieve from the Internet similar examples which do not have at least some of the plurality of features. Optionally, a second classifier is trained using the similar examples. Additional unstructured text is retrieved from the Internet and the second classifier is used to label the additional unstructured text for domain relevance.
公开/授权文献
- US20170091313A1 DOMAIN-SPECIFIC UNSTRUCTURED TEXT RETRIEVAL 公开/授权日:2017-03-30
信息查询