DOCUMENT RELEVANCE DETERMINATION FOR A CORPUS
摘要:
Embodiments of the invention include method, systems and computer program products for using a target similarity calculation to identify relevant content in a corpus of documents or records. The computer-implemented method includes creating, by a processor, a term frequency (TF) list for one or more documents of a corpus. The processor calculates an inverse document frequency (IDF) for each listed term. The processor calculates a TF-IDF for each listed term. The processor determines a similarity ranking for one or more documents of the corpus using a target similarity calculation using the TF-IDF for each listed term.
公开/授权文献
信息查询
0/0