Systems and methods for hybrid text summarization
    21.
    发明授权
    Systems and methods for hybrid text summarization 失效
    混合文本摘要的系统和方法

    公开(公告)号:US07610190B2

    公开(公告)日:2009-10-27

    申请号:US10684508

    申请日:2003-10-15

    IPC分类号: G06F17/27

    摘要: Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse to determine supporting preceding discourse constituents that preserve grammaticality. A hybrid text summary is then determined based on the structural representation of the discourse and relevance scores.

    摘要翻译: 提供了将文本分割成分类话语组成部分并将话语组成部分附加到话语的结构表示中的技术。 还提供了用于确定文本的混合结构和非结构摘要的技术。 基于话语分析理论将文本分割成至少包含可能的世界观中关于单个事件的时空信息的主要话语成分。 话语成分然后插入话语的结构表示。 非结构性技术用于确定相关性分数,确定重要的话语成分。 相关性分数通过话语的结构表示来抵消,以确定支持保留语法的前面的话语组成部分。 然后基于话语和相关性分数的结构表示来确定混合文本摘要。

    Systems and methods for collaborative note-taking
    22.
    发明授权
    Systems and methods for collaborative note-taking 失效
    用于协同笔记的系统和方法

    公开(公告)号:US07542971B2

    公开(公告)日:2009-06-02

    申请号:US10768675

    申请日:2004-02-02

    IPC分类号: G06F17/30 G10L15/00

    摘要: Techniques are provided for determining collaborative notes and automatically recognizing speech, handwriting and other type of information. Domain and optional actor/speaker information associated with the support information is determined. An initial automatic speech recognition model is determined based on the domain and/or actor information. The domain and/or actor/speaker language model is used to recognize text in the speech information associated with the support information. Presentation support information such as slides, speaker notes and the like are determined. The semantic overlap between the support information and the salient non-function words in the recognized text and collaborative user feedback information are used to determine relevancy scores for the recognized text. Grammaticality, well formedness, self referential integrity and other features are used to determine correctness scores. Suggested collaborative notes are displayed in the user interface based on the salient non-function words. User actions in the user interface determine feedback signals. Recognition models such as automatic speech recognition, handwriting recognition are determined based on the feedback signals and the correctness and relevance scores.

    摘要翻译: 提供了用于确定协作笔记并自动识别语音,手写和其他类型的信息的技术。 确定与支持信息相关联的域和可选演员/扬声器信息。 基于域和/或行为者信息确定初始自动语音识别模型。 域和/或演员/扬声器语言模型用于识别与支持信息相关联的语音信息中的文本。 确定幻灯片,说话者笔记等的演示支援信息。 使用识别文本中的支持信息和显着非函数词之间的语义重叠以及协作用户反馈信息来确定识别文本的相关性得分。 使用语法,良好的形态,自我参照完整性等特征来确定正确性分数。 基于显着的非功能词,建议的协作笔记显示在用户界面中。 用户界面中的用户操作决定了反馈信号。 基于反馈信号和正确性和相关性分数确定识别模型,如自动语音识别,手写识别。

    Efficiently representing word sense probabilities
    23.
    发明授权
    Efficiently representing word sense probabilities 有权
    有效地表示单词感觉概率

    公开(公告)号:US08280721B2

    公开(公告)日:2012-10-02

    申请号:US12200999

    申请日:2008-08-29

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2755

    摘要: Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.

    摘要翻译: 字义概率被压缩以存储在语义索引中。 通过将桶分数分配给词语,将单词的每个单词感觉映射到多个水桶中的一个。 使用评分函数来分配使分配的桶分数的熵最大化的桶分数。 一旦桶分数被分配到单词感觉,桶分数被存储在语义索引中。 存储在语义索引中的桶分数可以用于在构建语义索引之前修剪一个或多个单词感觉。 桶分数也可用于在使用语义索引执行查询的时候对单词感觉进行修剪和排序。

    Efficient storage and retrieval of posting lists
    24.
    发明授权
    Efficient storage and retrieval of posting lists 有权
    发布清单的高效存储和检索

    公开(公告)号:US08229970B2

    公开(公告)日:2012-07-24

    申请号:US12201079

    申请日:2008-08-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30625

    摘要: A role tree having nodes corresponding to semantic roles in a hierarchy is defined. A posting list is generated for each association of a term and a semantic role in the hierarchy. The posting lists are stored contiguously on a physical storage medium such that a subtree of the hierarchy of semantic roles can be loaded from the storage medium as a single contiguous block. The posting lists for a subtree of the hierarchy are retrieved by obtaining data identifying the beginning location on the physical storage medium of the posting lists for the term at the top of a desired subtree of the hierarchy and data identifying the length of the posting lists of the desired subtree of the hierarchy. A single contiguous block that includes the posting lists for the desired subtree of the hierarchy is then retrieved from the beginning location through the specified length.

    摘要翻译: 定义了具有与层次结构中的语义角色对应的节点的角色树。 为层次结构中的术语和语义角色的每个关联生成发布列表。 发布列表被连续地存储在物理存储介质上,使得语义角色的层次结构的子树可以作为单个连续块从存储介质加载。 通过获得标识物理存储介质上的开始位置的数据来检索层次结构的子树的发布列表,用于在层次结构的期望子树的顶部的术语的发布列表,以及标识发布列表的长度的数据 所需层次结构的子树。 然后从起始位置通过指定的长度检索包含层次结构所需子树的发布列表的单个连续块。

    Iterators for Applying Term Occurrence-Level Constraints in Natural Language Searching
    26.
    发明申请
    Iterators for Applying Term Occurrence-Level Constraints in Natural Language Searching 有权
    在自然语言搜索中应用术语发生级约束的迭代器

    公开(公告)号:US20090070298A1

    公开(公告)日:2009-03-12

    申请号:US12201047

    申请日:2008-08-29

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30684

    摘要: Tools and techniques are described that relate to iterators for applying term occurrence-level constraints in natural language searching. These tools may receive a natural language input query, and define term occurrence-level constraints applicable to the input query. The methods may also identify facts requested in the input query, and may instantiate an iterator to traverse a fact index to identify candidate facts responsive to the input query. This iterator may traverse through at least a portion of the fact index. The methods may receive candidate facts from this iterator, with these candidate facts including terms, referred to as term-level occurrences. The methods may apply the term occurrence-level constraints to the term-level occurrences. The methods may select the candidate fact for inclusion in search results for the input query, based at least in part on applying the term occurrence-level constraint.

    摘要翻译: 描述了与在自然语言搜索中应用术语出现级约束的迭代器相关的工具和技术。 这些工具可以接收自然语言输入查询,并且定义适用于输入查询的术语出现级约束。 所述方法还可以识别在输入查询中请求的事实,并且可以实例化迭代器遍历事实索引以响应于输入查询来识别候选事实。 该迭代器可以遍历事实索引的至少一部分。 这些方法可以从这个迭代器接收候选事实,这些候选事实包括术语,称为术语级别的事件。 这些方法可以将术语“出现级”约束应用于术语级别的出现。 该方法可以至少部分地基于应用术语出现级别约束来选择候选事实以包括在输入查询的搜索结果中。

    EMPHASIZING SEARCH RESULTS ACCORDING TO CONCEPTUAL MEANING
    27.
    发明申请
    EMPHASIZING SEARCH RESULTS ACCORDING TO CONCEPTUAL MEANING 有权
    根据概念意味着搜索结果

    公开(公告)号:US20090063472A1

    公开(公告)日:2009-03-05

    申请号:US12201504

    申请日:2008-08-29

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30696 G06F17/30684

    摘要: Computer-readable media, computerized methods, and computer systems for conducting semantic processes to present search results that include highlighted regions which are relevant to a conceptual meaning of a query are provided. Initially, content of document(s) is accessed and semantic representations are derived by distilling linguistic representations from the content. These semantic representations may be stored at a semantic index. Also, a proposition is derived from the query by parsing search terms of the query, and distilling the proposition from the search terms. Typically, the proposition is a logical representation of the conceptual meaning of the query. The proposition is compared against the semantic representations at the semantic index to identify a matching set. Regions of the content within the document, from which the matching set of semantic representations are derived, are targeted. Accordingly, highlighting may be applied to the targeted regions when presenting or displaying the search results.

    摘要翻译: 提供了用于进行语义处理以呈现包括与查询的概念意义相关的突出显示区域的搜索结果的计算机可读介质,计算机化方法和计算机系统。 最初,访问文档的内容,并通过从内容中提取语言表示来导出语义表示。 这些语义表示可以存储在语义索引中。 此外,通过解析查询的搜索条件,并从搜索词中提取命题,从查询中导出命题。 通常,命题是查询的概念意义的逻辑表示。 将该命题与语义索引处的语义表示进行比较,以识别匹配集。 目标是从文档中导出匹配的语义表示集合的内容的区域。 因此,当呈现或显示搜索结果时,突出显示可以应用于目标区域。