-
公开(公告)号:US20130232129A1
公开(公告)日:2013-09-05
申请号:US13487260
申请日:2012-06-04
申请人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
发明人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
IPC分类号: G06F17/30
CPC分类号: G06F17/30672
摘要: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.
摘要翻译: 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。
-
公开(公告)号:US08745019B2
公开(公告)日:2014-06-03
申请号:US13487260
申请日:2012-06-04
申请人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
发明人: Tao Cheng , Kaushik Chakrabarti , Surajit Chaudhuri , Dong Xin
IPC分类号: G06F17/30
CPC分类号: G06F17/30672
摘要: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.
摘要翻译: 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。
-
公开(公告)号:US20090327223A1
公开(公告)日:2009-12-31
申请号:US12146469
申请日:2008-06-26
申请人: Kaushik Chakrabarti , Surajit Chaudhuri , Venkatesh Ganti , Dong Xin , Sanjay Agrawal , Arnd Christian Konig
发明人: Kaushik Chakrabarti , Surajit Chaudhuri , Venkatesh Ganti , Dong Xin , Sanjay Agrawal , Arnd Christian Konig
CPC分类号: G06F16/951
摘要: The described implementations relate to query portals. One technique analyzes search results generated by a web search engine responsive to a user search query. The technique also dynamically generates a query portal that lists the search results as well as entities identified from the search results.
摘要翻译: 所描述的实现涉及查询门户。 一种技术分析响应于用户搜索查询的web搜索引擎生成的搜索结果。 该技术还动态生成查询门户,其中列出搜索结果以及从搜索结果中识别的实体。
-
公开(公告)号:US20100313258A1
公开(公告)日:2010-12-09
申请号:US12478120
申请日:2009-06-04
申请人: Surajit Chaudhuri , Venkatesh Ganti , Dong Xin
发明人: Surajit Chaudhuri , Venkatesh Ganti , Dong Xin
IPC分类号: H04L9/32
CPC分类号: G06F17/2795 , G06F17/278
摘要: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.
摘要翻译: 本文公开了使用文档集合识别实体的同义词。 在一些方面,可以分析来自文档集合的文档以识别包括一个或多个令牌(例如,单词,数字等)的命中序列。 然后可以使用命中序列来生成作为命中序列和实体名称的子集的识别令牌集(DTS's)。 DTS与相应的实体名称相匹配,然后用于通过选择靠近DTS的文档中的相邻文本来创建DTS短语。 可以分析DTS短语以确定对应的DTS是否是实体名称的同义词。 在各方面,使用存在于DTS短语中的关联实体名称的令牌来产生DTS的得分。 当分数至少达到阈值时,DTS可以被指定为同义词。 可以为每个实体名称生成同义词列表。
-
公开(公告)号:US20100293179A1
公开(公告)日:2010-11-18
申请号:US12465832
申请日:2009-05-14
申请人: Surajit Chaudhuri , Venkatesh Ganti , Dong Xin
发明人: Surajit Chaudhuri , Venkatesh Ganti , Dong Xin
IPC分类号: G06F17/30
CPC分类号: G06F16/951
摘要: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.
摘要翻译: 本文公开了使用网络搜索结果识别实体的同义词。 在某些方面,选择实体名称的令牌候选字符串作为搜索项。 搜索项由服务器发送到搜索引擎,搜索引擎又在执行搜索之后将搜索结果发送回服务器。 服务器分析搜索结果,根据搜索结果生成分数,然后根据分数确定候选字符串的状态(同义词或不是同义词)。 在另外的方面,通过使用由实体名称的所有可能候选字符串形成的格子的关系,基于搜索到的候选字符串的状态,将附加候选字符串指定为同义词或不是同义词。
-
公开(公告)号:US08533203B2
公开(公告)日:2013-09-10
申请号:US12478120
申请日:2009-06-04
申请人: Surajit Chaudhuri , Venkatesh Ganti , Dong Xin
发明人: Surajit Chaudhuri , Venkatesh Ganti , Dong Xin
CPC分类号: G06F17/2795 , G06F17/278
摘要: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.
摘要翻译: 本文公开了使用文档集合识别实体的同义词。 在一些方面,可以分析来自文档集合的文档以识别包括一个或多个令牌(例如,单词,数字等)的命中序列。 然后可以使用命中序列来生成作为命中序列和实体名称的子集的识别令牌集(DTS's)。 DTS与相应的实体名称相匹配,然后用于通过选择靠近DTS的文档中的相邻文本来创建DTS短语。 可以分析DTS短语以确定对应的DTS是否是实体名称的同义词。 在各方面,使用存在于DTS短语中的关联实体名称的令牌来产生DTS的得分。 当分数至少达到阈值时,DTS可以被指定为同义词。 可以为每个实体名称生成同义词列表。
-
公开(公告)号:US08037069B2
公开(公告)日:2011-10-11
申请号:US12132108
申请日:2008-06-03
CPC分类号: G06F17/30707
摘要: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.
摘要翻译: 所描述的实现涉及数据分析,例如成员资格检查。 一种技术用于识别利用签名的文档子串和数据库成员之间的候选匹配。 该技术进一步验证候选匹配是真实匹配。
-
公开(公告)号:US08856047B2
公开(公告)日:2014-10-07
申请号:US13164788
申请日:2011-06-21
申请人: Kaushik Chakrabarti , Dong Xin , Bahman Bahmani
发明人: Kaushik Chakrabarti , Dong Xin , Bahman Bahmani
CPC分类号: G06F17/30864
摘要: A personalized page rank computation system is described herein that provides a fast MapReduce method for Monte Carlo approximation of personalized PageRank vectors of all the nodes in a graph. The method presented is both faster and less computationally intensive than existing methods, allowing a broader scope of problems to be solved by existing computing hardware. The system adopts the Monte Carlo approach and provides a method to compute single random walks of a given length for all nodes in a graph that it is superior in terms of the number of map-reduce iterations among a broad class of methods. The resulting solution reduces the I/O cost and outperforms the state-of-the-art FPPR approximation methods, in terms of efficiency and approximation error. Thus, the system can very efficiently perform single random walks of a given length starting at each node in the graph and can very efficiently approximate all the personalized PageRank vectors.
摘要翻译: 本文描述了一种个性化页面排名计算系统,其为图中所有节点的个性化PageRank向量的Monte Carlo近似提供了快速的MapReduce方法。 所提出的方法比现有方法更快,计算量更少,允许现有计算硬件解决更广泛的问题。 该系统采用蒙特卡罗方法,并提供了一种方法,用于计算图中所有节点的给定长度的单个随机散列,该方法在广泛类方法中的映射减少迭代次数方面是优越的。 所产生的解决方案在效率和近似误差方面降低了I / O成本,并且优于现有技术的FPPR近似方法。 因此,系统可以非常有效地执行从图中的每个节点开始的给定长度的单个随机游走,并且可以非常有效地接近所有个性化PageRank向量。
-
公开(公告)号:US20120330864A1
公开(公告)日:2012-12-27
申请号:US13164788
申请日:2011-06-21
申请人: Kaushik Chakrabarti , Dong Xin , Bahman Bahmani
发明人: Kaushik Chakrabarti , Dong Xin , Bahman Bahmani
IPC分类号: G06N3/12
CPC分类号: G06F17/30864
摘要: A personalized page rank computation system is described herein that provides a fast MapReduce method for Monte Carlo approximation of personalized PageRank vectors of all the nodes in a graph. The method presented is both faster and less computationally intensive than existing methods, allowing a broader scope of problems to be solved by existing computing hardware. The system adopts the Monte Carlo approach and provides a method to compute single random walks of a given length for all nodes in a graph that it is superior in terms of the number of map-reduce iterations among a broad class of methods. The resulting solution reduces the I/O cost and outperforms the state-of-the-art FPPR approximation methods, in terms of efficiency and approximation error. Thus, the system can very efficiently perform single random walks of a given length starting at each node in the graph and can very efficiently approximate all the personalized PageRank vectors.
摘要翻译: 本文描述了一种个性化页面排名计算系统,其为图中所有节点的个性化PageRank向量的Monte Carlo近似提供了快速的MapReduce方法。 所提出的方法比现有方法更快,计算量更少,允许现有计算硬件解决更广泛的问题。 该系统采用蒙特卡罗方法,并提供了一种方法,用于计算图中所有节点的给定长度的单个随机散列,该方法在广泛类方法中的映射减少迭代次数方面是优越的。 所产生的解决方案在效率和近似误差方面降低了I / O成本,并且优于现有技术的FPPR近似方法。 因此,系统可以非常有效地执行从图中的每个节点开始的给定长度的单个随机游走,并且可以非常有效地接近所有个性化PageRank向量。
-
公开(公告)号:US07730060B2
公开(公告)日:2010-06-01
申请号:US11423303
申请日:2006-06-09
申请人: Kaushik Chakrabarti , Venkatesh Ganti , Dong Xin
发明人: Kaushik Chakrabarti , Venkatesh Ganti , Dong Xin
IPC分类号: G06F17/30
CPC分类号: G06F17/30964
摘要: The subject disclosure pertains to a class of object finder queries that return the best target objects that match a set of given keywords. Mechanisms are provided that facilitate identification of target objects related to search objects that match a set of query keywords. Scoring mechanisms/functions are also disclosed that compute relevance scores of target objects. Further, efficient early termination techniques are provided to compute the top K target objects based on a scoring function.
摘要翻译: 主题公开涉及一类对象查找器查询,其返回与一组给定关键字匹配的最佳目标对象。 提供了有助于识别与一组查询关键字匹配的搜索对象相关的目标对象的机制。 还公开了计算目标对象的相关性分数的评分机制/功能。 此外,提供有效的提前终止技术以基于评分功能计算顶部K个目标对象。
-
-
-
-
-
-
-
-
-