DOCUMENT RELEVANCE DETERMINATION FOR A CORPUS

    公开(公告)号:US20190130024A1

    公开(公告)日:2019-05-02

    申请号:US15794487

    申请日:2017-10-26

    IPC分类号: G06F17/30

    摘要: Embodiments of the invention include method, systems and computer program products for using a target similarity calculation to identify relevant content in a corpus of documents or records. The computer-implemented method includes creating, by a processor, a term frequency (TF) list for one or more documents of a corpus. The processor calculates an inverse document frequency (IDF) for each listed term. The processor calculates a TF-IDF for each listed term. The processor determines a similarity ranking for one or more documents of the corpus using a target similarity calculation using the TF-IDF for each listed term.