ROBUST DISCOVERY OF ENTITY SYNONYMS USING QUERY LOGS
    1.
    发明申请
    ROBUST DISCOVERY OF ENTITY SYNONYMS USING QUERY LOGS 有权
    使用查询记录对实体同步的可靠发现

    公开(公告)号:US20130232129A1

    公开(公告)日:2013-09-05

    申请号:US13487260

    申请日:2012-06-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.

    摘要翻译: 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。

    Robust discovery of entity synonyms using query logs
    2.
    发明授权
    Robust discovery of entity synonyms using query logs 有权
    使用查询日志强大发现实体同义词

    公开(公告)号:US08745019B2

    公开(公告)日:2014-06-03

    申请号:US13487260

    申请日:2012-06-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.

    摘要翻译: 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。

    IDENTIFYING SYNONYMS OF ENTITIES USING A DOCUMENT COLLECTION
    4.
    发明申请
    IDENTIFYING SYNONYMS OF ENTITIES USING A DOCUMENT COLLECTION 有权
    使用文件收集识别实体的同义词

    公开(公告)号:US20100313258A1

    公开(公告)日:2010-12-09

    申请号:US12478120

    申请日:2009-06-04

    IPC分类号: H04L9/32

    CPC分类号: G06F17/2795 G06F17/278

    摘要: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.

    摘要翻译: 本文公开了使用文档集合识别实体的同义词。 在一些方面,可以分析来自文档集合的文档以识别包括一个或多个令牌(例如,单词,数字等)的命中序列。 然后可以使用命中序列来生成作为命中序列和实体名称的子集的识别令牌集(DTS's)。 DTS与相应的实体名称相匹配,然后用于通过选择靠近DTS的文档中的相邻文本来创建DTS短语。 可以分析DTS短语以确定对应的DTS是否是实体名称的同义词。 在各方面,使用存在于DTS短语中的关联实体名称的令牌来产生DTS的得分。 当分数至少达到阈值时,DTS可以被指定为同义词。 可以为每个实体名称生成同义词列表。

    IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH
    5.
    发明申请
    IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH 审中-公开
    使用WEB搜索识别实体的同步

    公开(公告)号:US20100293179A1

    公开(公告)日:2010-11-18

    申请号:US12465832

    申请日:2009-05-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.

    摘要翻译: 本文公开了使用网络搜索结果识别实体的同义词。 在某些方面,选择实体名称的令牌候选字符串作为搜索项。 搜索项由服务器发送到搜索引擎,搜索引擎又在执行搜索之后将搜索结果发送回服务器。 服务器分析搜索结果,根据搜索结果生成分数,然后根据分数确定候选字符串的状态(同义词或不是同义词)。 在另外的方面,通过使用由实体名称的所有可能候选字符串形成的格子的关系,基于搜索到的候选字符串的状态,将附加候选字符串指定为同义词或不是同义词。

    Identifying synonyms of entities using a document collection
    6.
    发明授权
    Identifying synonyms of entities using a document collection 有权
    使用文档集合识别实体的同义词

    公开(公告)号:US08533203B2

    公开(公告)日:2013-09-10

    申请号:US12478120

    申请日:2009-06-04

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/2795 G06F17/278

    摘要: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.

    摘要翻译: 本文公开了使用文档集合识别实体的同义词。 在一些方面,可以分析来自文档集合的文档以识别包括一个或多个令牌(例如,单词,数字等)的命中序列。 然后可以使用命中序列来生成作为命中序列和实体名称的子集的识别令牌集(DTS's)。 DTS与相应的实体名称相匹配,然后用于通过选择靠近DTS的文档中的相邻文本来创建DTS短语。 可以分析DTS短语以确定对应的DTS是否是实体名称的同义词。 在各方面,使用存在于DTS短语中的关联实体名称的令牌来产生DTS的得分。 当分数至少达到阈值时,DTS可以被指定为同义词。 可以为每个实体名称生成同义词列表。

    Fast personalized page rank on map reduce
    8.
    发明授权
    Fast personalized page rank on map reduce 有权
    快速个性化页面排名在地图上减少

    公开(公告)号:US08856047B2

    公开(公告)日:2014-10-07

    申请号:US13164788

    申请日:2011-06-21

    CPC分类号: G06F17/30864

    摘要: A personalized page rank computation system is described herein that provides a fast MapReduce method for Monte Carlo approximation of personalized PageRank vectors of all the nodes in a graph. The method presented is both faster and less computationally intensive than existing methods, allowing a broader scope of problems to be solved by existing computing hardware. The system adopts the Monte Carlo approach and provides a method to compute single random walks of a given length for all nodes in a graph that it is superior in terms of the number of map-reduce iterations among a broad class of methods. The resulting solution reduces the I/O cost and outperforms the state-of-the-art FPPR approximation methods, in terms of efficiency and approximation error. Thus, the system can very efficiently perform single random walks of a given length starting at each node in the graph and can very efficiently approximate all the personalized PageRank vectors.

    摘要翻译: 本文描述了一种个性化页面排名计算系统,其为图中所有节点的个性化PageRank向量的Monte Carlo近似提供了快速的MapReduce方法。 所提出的方法比现有方法更快,计算量更少,允许现有计算硬件解决更广泛的问题。 该系统采用蒙特卡罗方法,并提供了一种方法,用于计算图中所有节点的给定长度的单个随机散列,该方法在广泛类方法中的映射减少迭代次数方面是优越的。 所产生的解决方案在效率和近似误差方面降低了I / O成本,并且优于现有技术的FPPR近似方法。 因此,系统可以非常有效地执行从图中的每个节点开始的给定长度的单个随机游走,并且可以非常有效地接近所有个性化PageRank向量。

    FAST PERSONALIZED PAGE RANK ON MAP REDUCE
    9.
    发明申请
    FAST PERSONALIZED PAGE RANK ON MAP REDUCE 有权
    快速个性化排序在地图减少

    公开(公告)号:US20120330864A1

    公开(公告)日:2012-12-27

    申请号:US13164788

    申请日:2011-06-21

    IPC分类号: G06N3/12

    CPC分类号: G06F17/30864

    摘要: A personalized page rank computation system is described herein that provides a fast MapReduce method for Monte Carlo approximation of personalized PageRank vectors of all the nodes in a graph. The method presented is both faster and less computationally intensive than existing methods, allowing a broader scope of problems to be solved by existing computing hardware. The system adopts the Monte Carlo approach and provides a method to compute single random walks of a given length for all nodes in a graph that it is superior in terms of the number of map-reduce iterations among a broad class of methods. The resulting solution reduces the I/O cost and outperforms the state-of-the-art FPPR approximation methods, in terms of efficiency and approximation error. Thus, the system can very efficiently perform single random walks of a given length starting at each node in the graph and can very efficiently approximate all the personalized PageRank vectors.

    摘要翻译: 本文描述了一种个性化页面排名计算系统,其为图中所有节点的个性化PageRank向量的Monte Carlo近似提供了快速的MapReduce方法。 所提出的方法比现有方法更快,计算量更少,允许现有计算硬件解决更广泛的问题。 该系统采用蒙特卡罗方法,并提供了一种方法,用于计算图中所有节点的给定长度的单个随机散列,该方法在广泛类方法中的映射减少迭代次数方面是优越的。 所产生的解决方案在效率和近似误差方面降低了I / O成本,并且优于现有技术的FPPR近似方法。 因此,系统可以非常有效地执行从图中的每个节点开始的给定长度的单个随机游走,并且可以非常有效地接近所有个性化PageRank向量。

    Efficient evaluation of object finder queries
    10.
    发明授权
    Efficient evaluation of object finder queries 失效
    对象查询器查询的高效评估

    公开(公告)号:US07730060B2

    公开(公告)日:2010-06-01

    申请号:US11423303

    申请日:2006-06-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30964

    摘要: The subject disclosure pertains to a class of object finder queries that return the best target objects that match a set of given keywords. Mechanisms are provided that facilitate identification of target objects related to search objects that match a set of query keywords. Scoring mechanisms/functions are also disclosed that compute relevance scores of target objects. Further, efficient early termination techniques are provided to compute the top K target objects based on a scoring function.

    摘要翻译: 主题公开涉及一类对象查找器查询,其返回与一组给定关键字匹配的最佳目标对象。 提供了有助于识别与一组查询关键字匹配的搜索对象相关的目标对象的机制。 还公开了计算目标对象的相关性分数的评分机制/功能。 此外,提供有效的提前终止技术以基于评分功能计算顶部K个目标对象。