-
公开(公告)号:US10032131B2
公开(公告)日:2018-07-24
申请号:US13527601
申请日:2012-06-20
申请人: Tao Cheng , Kris Ganjam , Kaushik Chakrabarti , Zhimin Chen , Vivek R. Narasayya , Surajit Chaudhuri
发明人: Tao Cheng , Kris Ganjam , Kaushik Chakrabarti , Zhimin Chen , Vivek R. Narasayya , Surajit Chaudhuri
摘要: A data service system is described herein which processes raw data assets from at least one network-accessible system (such as a search system), to produce processed data assets. Enterprise applications can then leverage the processed data assets to perform various environment-specific tasks. In one implementation, the data service system can generate any of: synonym resources for use by an enterprise application in providing synonyms for specified terms associated with entities; augmentation resources for use by an enterprise application in providing supplemental information for specified seed information; and spelling-correction resources for use by an enterprise application in providing spelling information for specified terms, and so on.
-
2.
公开(公告)号:US20130346464A1
公开(公告)日:2013-12-26
申请号:US13527601
申请日:2012-06-20
申请人: Tao Cheng , Kris Ganjam , Kaushik Chakrabarti , Zhimin Chen , Vivek R. Narasayya , Surajit Chaudhuri
发明人: Tao Cheng , Kris Ganjam , Kaushik Chakrabarti , Zhimin Chen , Vivek R. Narasayya , Surajit Chaudhuri
IPC分类号: G06F15/16
CPC分类号: G06Q10/10
摘要: A data service system is described herein which processes raw data assets from at least one network-accessible system (such as a search system), to produce processed data assets. Enterprise applications can then leverage the processed data assets to perform various environment-specific tasks. In one implementation, the data service system can generate any of: synonym resources for use by an enterprise application in providing synonyms for specified terms associated with entities; augmentation resources for use by an enterprise application in providing supplemental information for specified seed information; and spelling-correction resources for use by an enterprise application in providing spelling information for specified terms, and so on.
摘要翻译: 本文描述了一种数据服务系统,其处理来自至少一个网络可访问系统(例如搜索系统)的原始数据资产以产生处理的数据资产。 企业应用程序可以利用已处理的数据资产来执行各种环境特定任务。 在一个实现中,数据服务系统可以生成以下任何一种:供企业应用使用的同义词资源,为与实体相关联的指定术语提供同义词; 增加资源供企业应用用于提供指定种子信息的补充信息; 以及企业应用程序为指定的术语提供拼写信息的拼写纠正资源等。
-
公开(公告)号:US08386529B2
公开(公告)日:2013-02-26
申请号:US12709508
申请日:2010-02-21
IPC分类号: G06F17/30
CPC分类号: G06F17/30306
摘要: This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.
摘要翻译: 本专利申请涉及外键检测。 一个实现获得一组数据表。 此实现将自动确定集合的不同表中的列的外键关系。
-
公开(公告)号:US20110208748A1
公开(公告)日:2011-08-25
申请号:US12709508
申请日:2010-02-21
IPC分类号: G06F17/30
CPC分类号: G06F17/30306
摘要: This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.
摘要翻译: 本专利申请涉及外键检测。 一个实现获得一组数据表。 此实现将自动确定集合的不同表中的列的外键关系。
-
公开(公告)号:US09594831B2
公开(公告)日:2017-03-14
申请号:US13531493
申请日:2012-06-22
申请人: Chi Wang , Kaushik Chakrabarti , Tao Cheng , Surajit Chaudhuri
发明人: Chi Wang , Kaushik Chakrabarti , Tao Cheng , Surajit Chaudhuri
CPC分类号: G06F17/30687 , G06F17/278
摘要: A targeted disambiguation system is described herein which determines true mentions of a list of named entities in a collection of documents. The list of named entities is homogenous in the sense that the entities pertain to the same subject matter domain. The system determines the true mentions by leveraging the homogeneity in the list, and, more specifically by applying a context similarity hypothesis, a co-mention hypothesis, and an interdependency hypothesis. In one implementation, the system executes its analysis using a graph-based model. The system can operate without the existence of additional information regarding the entities in the list; nevertheless, if such information is available, the system can integrate it into its analysis.
摘要翻译: 本文描述了一种有针对性的消歧系统,其确定了文档集合中真实提到的命名实体的列表。 在实体属于相同主题领域的意义上,命名实体的列表是同质的。 系统通过利用列表中的同质性来确定真实的提及,更具体地说,通过应用上下文相似性假设,共同提及假设和相互依赖性假设。 在一个实现中,系统使用基于图的模型来执行其分析。 该系统可以在没有关于列表中的实体的附加信息的情况下运行; 然而,如果这些信息可用,系统可以将其整合到其分析中。
-
公开(公告)号:US20130346421A1
公开(公告)日:2013-12-26
申请号:US13531493
申请日:2012-06-22
申请人: Chi Wang , Kaushik Chakrabarti , Tao Cheng , Surajit Chaudhuri
发明人: Chi Wang , Kaushik Chakrabarti , Tao Cheng , Surajit Chaudhuri
IPC分类号: G06F17/30
CPC分类号: G06F17/30687 , G06F17/278
摘要: A targeted disambiguation system is described herein which determines true mentions of a list of named entities in a collection of documents. The list of named entities is homogenous in the sense that the entities pertain to the same subject matter domain. The system determines the true mentions by leveraging the homogeneity in the list, and, more specifically by applying a context similarity hypothesis, a co-mention hypothesis, and an interdependency hypothesis. In one implementation, the system executes its analysis using a graph-based model. The system can operate without the existence of additional information regarding the entities in the list; nevertheless, if such information is available, the system can integrate it into its analysis.
摘要翻译: 本文描述了一种有针对性的消歧系统,其确定了文档集合中真实提到的命名实体的列表。 在实体属于相同主题领域的意义上,命名实体的列表是同质的。 系统通过利用列表中的同质性来确定真实的提及,更具体地说,通过应用上下文相似性假设,共同提及假设和相互依赖性假设。 在一个实现中,系统使用基于图的模型来执行其分析。 该系统可以在没有关于列表中的实体的附加信息的情况下运行; 然而,如果这些信息可用,系统可以将其整合到其分析中。
-
公开(公告)号:US20130132381A1
公开(公告)日:2013-05-23
申请号:US13298349
申请日:2011-11-17
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30277
摘要: A plurality of description phrases associated with a first domain may be determined, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain. An entity associated with the first domain may be obtained. An analysis of a second plurality of documents may be initiated to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of description phrases, and contexts associated with each of the co-occurrences of the mentions and description phrases, in each one of the second plurality of documents. A description tag association between the obtained entity and one of the description phrases may be determined, based on an analysis of the identified contexts.
摘要翻译: 可以基于第一多个文档的分析来确定与第一域相关联的多个描述短语,以确定描述短语与与第一域相关联的一个或多个名称标签的共同出现。 可以获得与第一域相关联的实体。 可以启动对第二多个文档的分析,以识别获得的实体的提及和多个描述短语中的一个或多个以及与提及和描述短语的共同出现中的每一个相关联的上下文, 在第二多个文档的每一个中。 可以基于对所识别的上下文的分析来确定获得的实体与描述短语之一之间的描述标签关联。
-
公开(公告)号:US09298825B2
公开(公告)日:2016-03-29
申请号:US13298349
申请日:2011-11-17
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30277
摘要: A plurality of description phrases associated with a first domain may be determined, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain. An entity associated with the first domain may be obtained. An analysis of a second plurality of documents may be initiated to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of description phrases, and contexts associated with each of the co-occurrences of the mentions and description phrases, in each one of the second plurality of documents. A description tag association between the obtained entity and one of the description phrases may be determined, based on an analysis of the identified contexts.
摘要翻译: 可以基于第一多个文档的分析来确定与第一域相关联的多个描述短语,以确定描述短语与与第一域相关联的一个或多个名称标签的共同出现。 可以获得与第一域相关联的实体。 可以启动对第二多个文档的分析,以识别获得的实体的提及和多个描述短语中的一个或多个以及与提及和描述短语的共同出现中的每一个相关联的上下文, 在第二多个文档的每一个中。 可以基于对所识别的上下文的分析来确定获得的实体与描述短语之一之间的描述标签关联。
-
公开(公告)号:US20130232129A1
公开(公告)日:2013-09-05
申请号:US13487260
申请日:2012-06-04
申请人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
发明人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
IPC分类号: G06F17/30
CPC分类号: G06F17/30672
摘要: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.
摘要翻译: 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。
-
公开(公告)号:US08745019B2
公开(公告)日:2014-06-03
申请号:US13487260
申请日:2012-06-04
申请人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
发明人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
IPC分类号: G06F17/30
CPC分类号: G06F17/30672
摘要: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.
摘要翻译: 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。
-
-
-
-
-
-
-
-
-