- 专利标题: Identifying duplicate electronic content based on metadata
-
申请号: US13007054申请日: 2011-01-14
-
公开(公告)号: US08266115B1公开(公告)日: 2012-09-11
- 发明人: Tim Park , Dmitry Dolinsky
- 申请人: Tim Park , Dmitry Dolinsky
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Fish & Richardson P.C.
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/00
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for de-duplicating electronic content based on comparing metadata. In one aspect, a method includes comparing first metadata associated with a first item of electronic content to second metadata associated with a second item of electronic content, and generating a score based on the comparison. The method also includes establishing that the first and second items of electronic content comprise potentially duplicate content when the score is greater than a predetermined threshold value, and providing information identifying either the first or second items of electronic content for display when establishing that the first and second items of electronic content comprise potentially duplicate content.
信息查询