发明授权
US09058378B2 System and method for identification of near duplicate user-generated content
有权
用于识别近似重复的用户生成的内容的系统和方法
- 专利标题: System and method for identification of near duplicate user-generated content
- 专利标题(中): 用于识别近似重复的用户生成的内容的系统和方法
-
申请号: US12101561申请日: 2008-04-11
-
公开(公告)号: US09058378B2公开(公告)日: 2015-06-16
- 发明人: Robin Johan Schuil
- 申请人: Robin Johan Schuil
- 申请人地址: US CA San Jose
- 专利权人: eBay Inc.
- 当前专利权人: eBay Inc.
- 当前专利权人地址: US CA San Jose
- 代理机构: Schwegman Lundberg & Woessner, P.A.
- 主分类号: G06F17/00
- IPC分类号: G06F17/00 ; G06F17/30 ; G06Q30/08
摘要:
A computer-implemented system and method relates to identifying near duplicate content. An example embodiment includes a data receiver to receive a first instance of user-generated content and a tokenizer to tokenize the first instance into a set of words, create a set of portions from the tokenized first instance, and assign weight to each portion of the set of portions. The example embodiment also includes a magnitude calculator to calculate a magnitude for the first instance based on the weight of each portion and a resemblance score calculator to search a data store for a second instance with at least one portion in common with the first instance and calculate a resemblance score between the first instance and the second instance.
公开/授权文献
信息查询