- 专利标题: Gap identification in corpora
-
申请号: US15622762申请日: 2017-06-14
-
公开(公告)号: US10740365B2公开(公告)日: 2020-08-11
- 发明人: Brendan C. Bull , Scott R. Carrier , Aysu Ezen Can , Dwi Sianto Mansjur
- 申请人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理商 Alexis N. Hatzis
- 主分类号: G06F16/31
- IPC分类号: G06F16/31 ; G06F16/36 ; G06F16/33 ; G06N5/02 ; G06F40/242 ; G06F40/247
摘要:
Embodiments of the present invention disclose a method, a computer program product, and a computer system for identifying information gaps in corpora. A computer receives a document and extracts keywords from the document while filtering trivial keywords. The computer identifies and extracts top keywords detailed by the document using a topic modelling approach before determining whether the extracted top keywords exceed a threshold use frequency. Based on determining that the top keywords exceed a threshold use frequency, determining whether the top keywords have a relation to other entities within the document and, if so, determining whether the top keywords are defined within the document. Based on determining that the top keywords are not defined in the document, adding the top keywords to a list and defining the top keywords.
公开/授权文献
- US20180365313A1 GAP IDENTIFICATION IN CORPORA 公开/授权日:2018-12-20
信息查询