发明申请
US20130238611A1 Automatically Mining Patterns for Rule Based Data Standardization Systems
审中-公开
基于规则的数据标准化系统自动挖掘模式
- 专利标题: Automatically Mining Patterns for Rule Based Data Standardization Systems
- 专利标题(中): 基于规则的数据标准化系统自动挖掘模式
-
申请号: US13415144申请日: 2012-03-08
-
公开(公告)号: US20130238611A1公开(公告)日: 2013-09-12
- 发明人: Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam
- 申请人: Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
公开/授权文献
信息查询