Invention Grant
- Patent Title: Systems and methods for identifying similar documents
- Patent Title (中): 识别类似文件的系统和方法
-
Application No.: US12050626Application Date: 2008-03-18
-
Publication No.: US07958136B1Publication Date: 2011-06-07
- Inventor: Taylor Curtis , Kenneth Heafield
- Applicant: Taylor Curtis , Kenneth Heafield
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Sterne, Kessler, Goldstein & Fox, P.L.L.C.
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
The present invention provides systems and methods for identifying similar documents. In an embodiment, the present invention identifies similar documents by (1) receiving document text for a current document that includes at least one word; (2) calculating a prominence score and a descriptiveness score for each word and each pair of consecutive words; (3) calculating a comparison metric for the current document; (4) finding at least one potential document, where document text for each potential document includes at least one of the words; and (5) analyzing each potential document to identify at least one similar document.
Information query