-
公开(公告)号:US20130117012A1
公开(公告)日:2013-05-09
申请号:US13288942
申请日:2011-11-03
申请人: Yifat Orlin , Elad Ziklik , Gal Novik , Neta Haiby , Efim Hudis , Meir Raviv , Joseph I. Malka
发明人: Yifat Orlin , Elad Ziklik , Gal Novik , Neta Haiby , Efim Hudis , Meir Raviv , Joseph I. Malka
IPC分类号: G06F17/27
CPC分类号: G06Q10/00 , G06F17/277
摘要: The subject disclosure generally relates to parsing unstructured data based on knowledge of domains related to the unstructured data. A domain identification component can identify a set of domains related to a term in a data set. An inspection component can identify unmatched words, and unmatched related domains. A correlation component can compare the unmatched words to known values for the unmatched domains, and a manager component can match the unmatched words with the unmatched domains based on the comparison. In addition, combinations of the words can be generated based on a set of predetermined rules, and compared to the unmatched domains. Furthermore, delimiter based parsing can be employed to augment the knowledge based parsing.
摘要翻译: 主题公开通常涉及基于与非结构化数据相关的域的知识来解析非结构化数据。 域识别组件可以标识与数据集中的术语相关的一组域。 检查组件可以识别不匹配的单词和不匹配的相关域。 相关分量可以将不匹配的词与未匹配的域的已知值进行比较,并且管理器组件可以基于比较将不匹配的词与不匹配的域进行匹配。 此外,可以基于一组预定规则来生成单词的组合,并且与不匹配的域进行比较。 此外,可以使用基于分隔符的解析来增加基于知识的解析。
-
公开(公告)号:US20120078857A1
公开(公告)日:2012-03-29
申请号:US12893791
申请日:2010-09-29
申请人: Neta Haiby , Elad Ziklik , Efim Hudis , Gad Peleg
发明人: Neta Haiby , Elad Ziklik , Efim Hudis , Gad Peleg
IPC分类号: G06F17/00
CPC分类号: G06F17/30303 , G06Q10/00 , G06Q30/02
摘要: The present invention extends to methods, systems, and computer program products for exploring and selecting data cleansing service providers. Embodiments of the invention permit a user to explore different data cleansing service providers and compare quality results from the different data cleansing service providers. Sample data is mapped to a specified data domain. A list of service providers, for cleansing data for the selected data domain, is provided to a user. The user selects a subset of service providers. The sample data is submitted to the subset of service providers, which return results including allegedly cleansed data. The results are profiled and a comparison of the subset of service providers is presented to the user. The user selects a service provider to use when cleansing further data.
摘要翻译: 本发明扩展到用于探索和选择数据清洁服务提供商的方法,系统和计算机程序产品。 本发明的实施例允许用户探索不同的数据清洁服务提供商并且比较来自不同数据清洁服务提供商的质量结果。 样本数据映射到指定的数据域。 为用户提供用于清除所选数据域的数据的服务提供商列表。 用户选择服务提供商的子集。 样本数据被提交给服务提供商的子集,该子集返回结果,包括涉嫌清理的数据。 对结果进行分析,并向用户呈现服务提供商子集的比较。 当清理进一步的数据时,用户选择要使用的服务提供商。
-
公开(公告)号:US08510276B2
公开(公告)日:2013-08-13
申请号:US12893791
申请日:2010-09-29
申请人: Neta Haiby , Elad Ziklik , Efim Hudis , Gad Peleg
发明人: Neta Haiby , Elad Ziklik , Efim Hudis , Gad Peleg
CPC分类号: G06F17/30303 , G06Q10/00 , G06Q30/02
摘要: The present invention extends to methods, systems, and computer program products for exploring and selecting data cleansing service providers. Embodiments of the invention permit a user to explore different data cleansing service providers and compare quality results from the different data cleansing service providers. Sample data is mapped to a specified data domain. A list of service providers, for cleansing data for the selected data domain, is provided to a user. The user selects a subset of service providers. The sample data is submitted to the subset of service providers, which return results including allegedly cleansed data. The results are profiled and a comparison of the subset of service providers is presented to the user. The user selects a service provider to use when cleansing further data.
-
-