Iterators for applying term occurrence-level constraints in natural language searching
    1.
    发明授权
    Iterators for applying term occurrence-level constraints in natural language searching 有权
    在自然语言搜索中应用术语发生级约束的迭代器

    公开(公告)号:US07984032B2

    公开(公告)日:2011-07-19

    申请号:US12201047

    申请日:2008-08-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30684

    摘要: Tools and techniques are described that relate to iterators for applying term occurrence-level constraints in natural language searching. These tools may receive a natural language input query, and define term occurrence-level constraints applicable to the input query. The methods may also identify facts requested in the input query, and may instantiate an iterator to traverse a fact index to identify candidate facts responsive to the input query. This iterator may traverse through at least a portion of the fact index. The methods may receive candidate facts from this iterator, with these candidate facts including terms, referred to as term-level occurrences. The methods may apply the term occurrence-level constraints to the term-level occurrences. The methods may select the candidate fact for inclusion in search results for the input query, based at least in part on applying the term occurrence-level constraint.

    摘要翻译: 描述了与在自然语言搜索中应用术语出现级约束的迭代器相关的工具和技术。 这些工具可以接收自然语言输入查询,并且定义适用于输入查询的术语出现级约束。 所述方法还可以识别在输入查询中请求的事实,并且可以实例化迭代器遍历事实索引以响应于输入查询来识别候选事实。 该迭代器可以遍历事实索引的至少一部分。 这些方法可以从这个迭代器接收候选事实,这些候选事实包括术语,称为术语级别的事件。 这些方法可以将术语“出现级”约束应用于术语级别的出现。 该方法可以至少部分地基于应用术语出现级别约束来选择候选事实以包括在输入查询的搜索结果中。

    Iterators for Applying Term Occurrence-Level Constraints in Natural Language Searching
    2.
    发明申请
    Iterators for Applying Term Occurrence-Level Constraints in Natural Language Searching 有权
    在自然语言搜索中应用术语发生级约束的迭代器

    公开(公告)号:US20090070298A1

    公开(公告)日:2009-03-12

    申请号:US12201047

    申请日:2008-08-29

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30684

    摘要: Tools and techniques are described that relate to iterators for applying term occurrence-level constraints in natural language searching. These tools may receive a natural language input query, and define term occurrence-level constraints applicable to the input query. The methods may also identify facts requested in the input query, and may instantiate an iterator to traverse a fact index to identify candidate facts responsive to the input query. This iterator may traverse through at least a portion of the fact index. The methods may receive candidate facts from this iterator, with these candidate facts including terms, referred to as term-level occurrences. The methods may apply the term occurrence-level constraints to the term-level occurrences. The methods may select the candidate fact for inclusion in search results for the input query, based at least in part on applying the term occurrence-level constraint.

    摘要翻译: 描述了与在自然语言搜索中应用术语出现级约束的迭代器相关的工具和技术。 这些工具可以接收自然语言输入查询,并且定义适用于输入查询的术语出现级约束。 所述方法还可以识别在输入查询中请求的事实,并且可以实例化迭代器遍历事实索引以响应于输入查询来识别候选事实。 该迭代器可以遍历事实索引的至少一部分。 这些方法可以从这个迭代器接收候选事实,这些候选事实包括术语,称为术语级别的事件。 这些方法可以将术语“出现级”约束应用于术语级别的出现。 该方法可以至少部分地基于应用术语出现级别约束来选择候选事实以包括在输入查询的搜索结果中。

    Efficient Storage and Retrieval of Posting Lists
    3.
    发明申请
    Efficient Storage and Retrieval of Posting Lists 有权
    有效存储和检索发布列表

    公开(公告)号:US20090132521A1

    公开(公告)日:2009-05-21

    申请号:US12201079

    申请日:2008-08-29

    IPC分类号: G06F17/30 G06F17/28

    CPC分类号: G06F17/2785 G06F17/30625

    摘要: A role tree having nodes corresponding to semantic roles in a hierarchy is defined. A posting list is generated for each association of a term and a semantic role in the hierarchy. The posting lists are stored contiguously on a physical storage medium such that a subtree of the hierarchy of semantic roles can be loaded from the storage medium as a single contiguous block. The posting lists for a subtree of the hierarchy are retrieved by obtaining data identifying the beginning location on the physical storage medium of the posting lists for the term at the top of a desired subtree of the hierarchy and data identifying the length of the posting lists of the desired subtree of the hierarchy. A single contiguous block that includes the posting lists for the desired subtree of the hierarchy is then retrieved from the beginning location through the specified length.

    摘要翻译: 定义了具有与层次结构中的语义角色对应的节点的角色树。 为层次结构中的术语和语义角色的每个关联生成发布列表。 发布列表被连续地存储在物理存储介质上,使得语义角色的层次结构的子树可以作为单个连续块从存储介质加载。 通过获得标识物理存储介质上的开始位置的数据来检索层次结构的子树的发布列表,用于在层次结构的期望子树的顶部的术语的发布列表,以及标识发布列表的长度的数据 所需层次结构的子树。 然后从起始位置通过指定的长度检索包含层次结构所需子树的发布列表的单个连续块。

    Fact-based indexing for natural language search
    4.
    发明授权
    Fact-based indexing for natural language search 有权
    基于事实的自然语言搜索索引

    公开(公告)号:US08639708B2

    公开(公告)日:2014-01-28

    申请号:US12201596

    申请日:2008-08-29

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30696 G06F17/30684

    摘要: Computer-readable media and a computer system for implementing a natural language search using fact-based structures and for generating such fact-based structures are provided. A fact-based structure is generated using a semantic structure, which represents information, such as text, from a document, such as a web page. Typically, a natural language parser is used to create a semantic structure of the information, and the parser identifies terms, as well as the relationship between the terms. A fact-based structure of a semantic structure allows for a linear structure of these terms and their relationships to be created, while also maintaining identifiers of the terms to convey the dependency of one fact-based structure on another fact-based structure. Additionally, synonyms and hypernyms are identified while generating the fact-based structure to improve the accuracy of the overall search.

    摘要翻译: 提供了计算机可读介质和用于使用基于事实的结构实现自然语言搜索和用于生成这种基于事实的结构的计算机系统。 使用语义结构生成基于事实的结构,该语义结构表示来自诸如网页的文档的诸如文本的信息。 通常,使用自然语言解析器来创建信息的语义结构,并且解析器识别术语以及术语之间的关系。 语义结构的基于事实的结构允许创建这些术语及其关系的线性结构,同时还保留术语的标识符以传达一个基于事实的结构对另一个基于事实的结构的依赖。 此外,在生成基于事实的结构以提高整体搜索的准确性的同时,确定同义词和高词。

    Efficient storage and retrieval of posting lists
    5.
    发明授权
    Efficient storage and retrieval of posting lists 有权
    发布清单的高效存储和检索

    公开(公告)号:US08229970B2

    公开(公告)日:2012-07-24

    申请号:US12201079

    申请日:2008-08-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30625

    摘要: A role tree having nodes corresponding to semantic roles in a hierarchy is defined. A posting list is generated for each association of a term and a semantic role in the hierarchy. The posting lists are stored contiguously on a physical storage medium such that a subtree of the hierarchy of semantic roles can be loaded from the storage medium as a single contiguous block. The posting lists for a subtree of the hierarchy are retrieved by obtaining data identifying the beginning location on the physical storage medium of the posting lists for the term at the top of a desired subtree of the hierarchy and data identifying the length of the posting lists of the desired subtree of the hierarchy. A single contiguous block that includes the posting lists for the desired subtree of the hierarchy is then retrieved from the beginning location through the specified length.

    摘要翻译: 定义了具有与层次结构中的语义角色对应的节点的角色树。 为层次结构中的术语和语义角色的每个关联生成发布列表。 发布列表被连续地存储在物理存储介质上,使得语义角色的层次结构的子树可以作为单个连续块从存储介质加载。 通过获得标识物理存储介质上的开始位置的数据来检索层次结构的子树的发布列表,用于在层次结构的期望子树的顶部的术语的发布列表,以及标识发布列表的长度的数据 所需层次结构的子树。 然后从起始位置通过指定的长度检索包含层次结构所需子树的发布列表的单个连续块。

    Emphasizing search results according to conceptual meaning
    6.
    发明授权
    Emphasizing search results according to conceptual meaning 有权
    强调搜索结果根据概念意义

    公开(公告)号:US08209321B2

    公开(公告)日:2012-06-26

    申请号:US12201504

    申请日:2008-08-29

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30696 G06F17/30684

    摘要: Computer-readable media, computerized methods, and computer systems for conducting semantic processes to present search results that include highlighted regions which are relevant to a conceptual meaning of a query are provided. Initially, content of document(s) is accessed and semantic representations are derived by distilling linguistic representations from the content. These semantic representations may be stored at a semantic index. Also, a proposition is derived from the query by parsing search terms of the query, and distilling the proposition from the search terms. Typically, the proposition is a logical representation of the conceptual meaning of the query. The proposition is compared against the semantic representations at the semantic index to identify a matching set. Regions of the content within the document, from which the matching set of semantic representations are derived, are targeted. Accordingly, highlighting may be applied to the targeted regions when presenting or displaying the search results.

    摘要翻译: 提供了用于进行语义处理以呈现包括与查询的概念意义相关的突出显示区域的搜索结果的计算机可读介质,计算机化方法和计算机系统。 最初,访问文档的内容,并通过从内容中提取语言表示来导出语义表示。 这些语义表示可以存储在语义索引中。 此外,通过解析查询的搜索条件,并从搜索词中提取命题,从查询中导出命题。 通常,命题是查询的概念意义的逻辑表示。 将该命题与语义索引处的语义表示进行比较,以识别匹配集。 目标是从文档中导出匹配的语义表示集合的内容的区域。 因此,当呈现或显示搜索结果时,突出显示可以应用于目标区域。

    EMPHASIZING SEARCH RESULTS ACCORDING TO CONCEPTUAL MEANING
    7.
    发明申请
    EMPHASIZING SEARCH RESULTS ACCORDING TO CONCEPTUAL MEANING 有权
    根据概念意味着搜索结果

    公开(公告)号:US20090063472A1

    公开(公告)日:2009-03-05

    申请号:US12201504

    申请日:2008-08-29

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30696 G06F17/30684

    摘要: Computer-readable media, computerized methods, and computer systems for conducting semantic processes to present search results that include highlighted regions which are relevant to a conceptual meaning of a query are provided. Initially, content of document(s) is accessed and semantic representations are derived by distilling linguistic representations from the content. These semantic representations may be stored at a semantic index. Also, a proposition is derived from the query by parsing search terms of the query, and distilling the proposition from the search terms. Typically, the proposition is a logical representation of the conceptual meaning of the query. The proposition is compared against the semantic representations at the semantic index to identify a matching set. Regions of the content within the document, from which the matching set of semantic representations are derived, are targeted. Accordingly, highlighting may be applied to the targeted regions when presenting or displaying the search results.

    摘要翻译: 提供了用于进行语义处理以呈现包括与查询的概念意义相关的突出显示区域的搜索结果的计算机可读介质,计算机化方法和计算机系统。 最初,访问文档的内容,并通过从内容中提取语言表示来导出语义表示。 这些语义表示可以存储在语义索引中。 此外,通过解析查询的搜索条件,并从搜索词中提取命题,从查询中导出命题。 通常,命题是查询的概念意义的逻辑表示。 将该命题与语义索引处的语义表示进行比较,以识别匹配集。 目标是从文档中导出匹配的语义表示集合的内容的区域。 因此,当呈现或显示搜索结果时,突出显示可以应用于目标区域。

    Efficiently Representing Word Sense Probabilities
    8.
    发明申请
    Efficiently Representing Word Sense Probabilities 有权
    有效地代表词义概率

    公开(公告)号:US20090094019A1

    公开(公告)日:2009-04-09

    申请号:US12200999

    申请日:2008-08-29

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2755

    摘要: Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.

    摘要翻译: 字义概率被压缩以存储在语义索引中。 通过将一个桶分数分配给单词感觉,将单词的每个单词感觉映射到多个“桶”中的一个。 使用评分函数来分配使分配的桶分数的熵最大化的桶分数。 一旦桶分数被分配到单词感觉,桶分数被存储在语义索引中。 存储在语义索引中的桶分数可以用于在构建语义索引之前修剪一个或多个单词感觉。 桶分数也可用于在使用语义索引执行查询的时候对单词感觉进行修剪和排序。

    Efficiently representing word sense probabilities
    9.
    发明授权
    Efficiently representing word sense probabilities 有权
    有效地表示单词感觉概率

    公开(公告)号:US08280721B2

    公开(公告)日:2012-10-02

    申请号:US12200999

    申请日:2008-08-29

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2755

    摘要: Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.

    摘要翻译: 字义概率被压缩以存储在语义索引中。 通过将桶分数分配给词语,将单词的每个单词感觉映射到多个水桶中的一个。 使用评分函数来分配使分配的桶分数的熵最大化的桶分数。 一旦桶分数被分配到单词感觉,桶分数被存储在语义索引中。 存储在语义索引中的桶分数可以用于在构建语义索引之前修剪一个或多个单词感觉。 桶分数也可用于在使用语义索引执行查询的时候对单词感觉进行修剪和排序。

    Semi-automatic example-based induction of semantic translation rules to support natural language search
    10.
    发明授权
    Semi-automatic example-based induction of semantic translation rules to support natural language search 有权
    基于半自动的基于示例的语义翻译规则的归纳来支持自然语言搜索

    公开(公告)号:US08041697B2

    公开(公告)日:2011-10-18

    申请号:US12201066

    申请日:2008-08-29

    IPC分类号: G06F17/30

    摘要: Technologies are described herein for generating a semantic translation rule to support natural language search. In one method, a first expression and a second expression are received. A first representation is generated based on the first expression, and a second representation is generated based on the second expression. Aligned pairs of a first term in the first representation and a second term in the second representation are determined. For each aligned pair, the first term and the second term are replaced with a variable associated with the aligned pair. Word facts that occur in both the first representation and the second representation are removed from the first representation and the second representation. The remaining word facts in the first representation are replaced with a broader representation of the word facts. The translation rule including the first representation, an operator, and the second semantic representation is generated.

    摘要翻译: 本文描述了用于生成支持自然语言搜索的语义翻译规则的技术。 在一种方法中,接收第一表达式和第二表达式。 基于第一表达式生成第一表示,并且基于第二表达式生成第二表示。 确定第一表示中的第一项和第二表示中的第二项的对齐对。 对于每个对齐的对,第一项和第二项被替换为与对齐对相关的变量。 在第一表示和第二表示中出现的字事实从第一表示和第二表示中移除。 第一个表述中的剩余单词事实被替换为事实一词的更广泛的表示。 生成包括第一表示,运算符和第二语义表示的翻译规则。