发明授权
- 专利标题: Systems and methods for discovering synonymous elements using context over multiple similar addresses
- 专利标题(中): 使用上下文发现多个相似地址的同义元素的系统和方法
-
申请号: US12771543申请日: 2010-04-30
-
公开(公告)号: US08682898B2公开(公告)日: 2014-03-25
- 发明人: Sachindra Joshi , Tanveer Faruquie , Hima Prasad Karanam , Marvin Mendelssohn , Mukesh Kumar Mohania , Angel Marie Smith , L Venkata Subramaniam , Girish Venkatachaliah
- 申请人: Sachindra Joshi , Tanveer Faruquie , Hima Prasad Karanam , Marvin Mendelssohn , Mukesh Kumar Mohania , Angel Marie Smith , L Venkata Subramaniam , Girish Venkatachaliah
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理机构: Ference & Associates LLC
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/00
摘要:
A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.