发明授权
US08391614B2 Determining near duplicate “noisy” data objects 有权
确定接近重复的嘈杂数据对象

  • 专利标题: Determining near duplicate “noisy” data objects
  • 专利标题(中): 确定接近重复的嘈杂数据对象
  • 申请号: US12161775
    申请日: 2007-01-25
  • 公开(公告)号: US08391614B2
    公开(公告)日: 2013-03-05
  • 发明人: Yiftach RavidAmir Milo
  • 申请人: Yiftach RavidAmir Milo
  • 申请人地址: IL Rosh Haayin
  • 专利权人: Equivio Ltd.
  • 当前专利权人: Equivio Ltd.
  • 当前专利权人地址: IL Rosh Haayin
  • 代理机构: Oliff & Berridge, PLC
  • 国际申请: PCT/IL2007/000095 WO 20070125
  • 国际公布: WO2007/086059 WO 20070802
  • 主分类号: G06K9/68
  • IPC分类号: G06K9/68 G06K9/40
Determining near duplicate “noisy” data objects
摘要:
A system configured to find near duplicate documents. For each two (or more) documents that are similar to each other, the system is configured to identify which of the differences is likely to be generated by an Optical Character Recognition software or otherwise due to difference between the original documents. As a result, the process of identifying similarity between documents is improved by identifying documents that were originally exact duplicates but are different one with respect to the other only due to OCR errors, or correct the similarity level between the documents by correcting errors introduced by the OCR tool.
公开/授权文献
信息查询
0/0