发明授权
- 专利标题: Method and system for calculating term-document importance
- 专利标题(中): 计算术语文件重要性的方法和系统
-
申请号: US09216085申请日: 1998-12-18
-
公开(公告)号: US06473753B1公开(公告)日: 2002-10-29
- 发明人: Sanjeev Katariya , William P. Jones
- 申请人: Sanjeev Katariya , William P. Jones
- 主分类号: G06F1730
- IPC分类号: G06F1730
摘要:
A weighting system for calculating the term-document importance for each term within each document that is part of a collection of documents (i.e., a corpus). The weighting system calculates the importance of a term within a document based on a computed normalized term frequency and a computed inverse document frequency. The computed normalized term frequency is a function, referred to as the “computed term frequency function” (“A”), of a normalized term frequency. The normalized term frequency is the term frequency, which is the number of times that the term occurs in the document, normalized by the total term frequency of the term within all documents, which is the total number of times that the term occurs in all the documents. The weighting system normalizes the term frequency by dividing the term frequency by a function, referred to as the “normalizing term frequency function” (“&Ggr;”), of the total term frequency. The computed inverse document frequency is a function, referred to as the “computed inverse document frequency function” (“B”) of the inverse document frequency. The weighting system identifies a computed normalized term frequency function A and a computed inverse document frequency function B so that on average the computed normalized term frequency and the computed inverse document frequency contribute equally to the weight of the terms.
信息查询