Invention Grant
- Patent Title: Systems and methods for identifying similar documents
- Patent Title (中): 识别类似文件的系统和方法
-
Application No.: US13153319Application Date: 2011-06-03
-
Publication No.: US08713034B1Publication Date: 2014-04-29
- Inventor: Taylor Curtis , Kenneth Heafield
- Applicant: Taylor Curtis , Kenneth Heafield
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Fish & Richardson P.C.
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
The present invention provides systems and methods for identifying similar documents. In an embodiment, the present invention identifies similar documents by (1) receiving document text for a current document that includes at least one word; (2) calculating a prominence score and a descriptiveness score for each word and each pair of consecutive words; (3) calculating a comparison metric for the current document; (4) finding at least one potential document, where document text for each potential document includes at least one of the words; and (5) analyzing each potential document to identify at least one similar document.
Information query