Phrase recognition method and apparatus
    1.
    发明授权
    Phrase recognition method and apparatus 失效
    短语识别方法和装置

    公开(公告)号:US5819260A

    公开(公告)日:1998-10-06

    申请号:US589468

    申请日:1996-01-22

    IPC分类号: G06F17/30

    摘要: A phrase recognition method breaks streams of text into text "chunks" and selects certain chunks as "phrases" useful for automated full text searching. The phrase recognition method uses a carefully assembled list of partition elements to partition the text into the chunks, and selects phrases from the chunks according to a small number of frequency based definitions. The method can also incorporate additional processes such as categorization of proper names to enhance phrase recognition. The method selects phrases quickly and efficiently, referring simply to the phrases themselves and the frequency with which they are encountered, rather than relying on complex, time-consuming, resource-consuming grammatical analysis, or on collocation schemes of limited applicability, or on heuristical text analysis of limited reliability or utility.

    摘要翻译: 短语识别方法将文本流分解为文本“块”,并选择某些块作为自动全文搜索有用的“短语”。 短语识别方法使用仔细组装的分区元素列表来将文本划分成块,并且根据少量基于频率的定义从块中选择短语。 该方法还可以包括额外的过程,例如分类专有名称以增强短语识别。 该方法可以快速有效地选择短语,仅仅参考短语本身及其遇到的频率,而不是依赖于复杂,耗时,资源消耗的语法分析,或者是适用性有限的搭配方案,或者是有意义的 有限可靠性或效用的文本分析。

    Statistical thesaurus, method of forming same, and use thereof in query
expansion in automated text searching
    2.
    发明授权
    Statistical thesaurus, method of forming same, and use thereof in query expansion in automated text searching 失效
    统计词库,形成方法,以及在自动文本搜索中的查询扩展中的使用

    公开(公告)号:US5926811A

    公开(公告)日:1999-07-20

    申请号:US616883

    申请日:1996-03-15

    IPC分类号: G06F17/30 G06F17/21

    摘要: A statistical thesaurus is built dynamically, from the same text collection that is being searched, allowing improved generation of expanded query terms. The thesaurus is dynamic in that thesaurus records are collected, ranked, accessed, and applied dynamically. Thesaurus "records" are actually formed as indexed documents arranged in "collections". The collections are preferably distinguished based on text source (court cases versus news wires versus patents, and so forth). Each record has terms assembled in indexed groups (or segments) which inherently reflect a ranking based on relevance to an initial query. After an initial query is received, the appropriate collection(s) of records may be searched by a conventional search and retrieval engine, the searches inherently returning records ranked by degree of relevance due to the record indexing scheme. A record ranking scheme avoids contamination of relevant records by less relevant records. The record selection and the expansion query term generation processes are each divided into parallel threads. The separate threads correspond to respective text sources to enable the improved expansion query term generation to be provided in real time.

    摘要翻译: 统计词库是从正在搜索的相同文本集合中动态构建的,允许改进扩展查询词的生成。 词库是动态的,因为词典记录被收集,排序,访问和动态应用。 词典“记录”实际上是作为“集合”中排列的索引文档形成的。 收藏品最好根据文本来源(法庭案件与新闻线对专利等)区分开来。 每个记录都有汇编在索引组(或分段)中的条款,这些组根据初始查询的相关性固有地反映排名。 在接收到初始查询之后,可以通过常规搜索和检索引擎来搜索适当的记录集合,由于记录索引方案,搜索本质上返回由相关程度排列的记录。 记录排名计划避免相关记录的污染较少相关记录。 记录选择和扩展查询项生成过程分为并行线程。 单独的线程对应于相应的文本源,以使实时地提供改进的扩展查询词生成。

    Automated system and method for generating reasons that a court case is cited
    3.
    发明授权
    Automated system and method for generating reasons that a court case is cited 有权
    引发法院案件的自动化制度和方法

    公开(公告)号:US07693704B2

    公开(公告)日:2010-04-06

    申请号:US11056121

    申请日:2005-02-14

    IPC分类号: G06F17/27

    摘要: A computer-automated system and method identify text in a first “citing” court case, near a “citing instance” (in which a second “cited” court case is cited), that indicates the reason(s) for citing (RFC). The automated method of designating text, taken from a set of citing documents, as reasons for citing (RFC) that are associated with respective citing instances of a cited document, has steps including: obtaining contexts of the citing instances in the respective citing documents (each context including text that includes the citing instance and text that is near the citing instance), analyzing the content of the contexts, and selecting (from the citing instances' context) text that constitutes the RFC, based on the analyzed content of the contexts. A related computer-automated system and method selects content words that are highly related to the reasons a particular document is cited, and gives them weights that indicate their relative relevance. Another related computer-automated system and method forms lists of morphological forms of words. Still another related computer-automated system and method scores sentences to show their relevance to the reasons a document is cited. Also, another related computer-automated system and method generates lists of content words. In a preferred embodiment, the systems and methods are applied to legal (especially case law) documents and legal (especially case law) citations.

    摘要翻译: 计算机自动化系统和方法在第一个“引用”法庭案件中,在“引用案例”附近(其中引用了第二个“引用”法院案件)附近识别文本,其指示引用原因(RFC) 。 指定文本的自动化方法(从一组引用文件中获取)作为与被引用文献的各个引用实例相关联的引用(RFC)的原因,具有以下步骤:获得各个引用文档中的引用实例的上下文( 每个上下文包括包括引用实例和引用实例附近的文本的文本),分析上下文的内容,以及基于所分析的上下文的内容来选择(来自引用实例的上下文)构成RFC的文本 。 相关的计算机自动化系统和方法选择与引用特定文档的原因高度相关的内容词,并给出它们表示其相对相关性的权重。 另一个相关的计算机自动化系统和方法形成词形态形式的列表。 另一个相关的计算机自动化系统和方法对句子进行分类,以表明它们与引用文档的原因的相关性。 另外,另一个相关的计算机自动化系统和方法生成内容词列表。 在一个优选实施例中,系统和方法适用于法律(特别是判例法)文件和法律(特别是判例法)引用。

    Automated system and method for generating reasons that a court case is cited
    4.
    发明授权
    Automated system and method for generating reasons that a court case is cited 有权
    引发法院案件的自动化制度和方法

    公开(公告)号:US07464025B2

    公开(公告)日:2008-12-09

    申请号:US11056200

    申请日:2005-02-14

    IPC分类号: G06F17/27

    摘要: A computer-automated system and method identify text in a first “citing” court case, near a “citing instance” (in which a second “cited” court case is cited), that indicates the reason(s) for citing (RFC). The automated method of designating text, taken from a set of citing documents, as reasons for citing (RFC) that are associated with respective citing instances of a cited document, has steps including: obtaining contexts of the citing instances in the respective citing documents (each context including text that includes the citing instance and text that is near the citing instance), analyzing the content of the contexts, and selecting (from the citing instances' context) text that constitutes the RFC, based on the analyzed content of the contexts. A related computer-automated system and method selects content words that are highly related to the reasons a particular document is cited, and gives them weights that indicate their relative relevance. Another related computer-automated system and method forms lists of morphological forms of words. Still another related computer-automated system and method scores sentences to show their relevance to the reasons a document is cited. Also, another related computer-automated system and method generates lists of content words. In a preferred embodiment, the systems and methods are applied to legal (especially case law) documents and legal (especially case law) citations.

    摘要翻译: 计算机自动化系统和方法在第一个“引用”法庭案件中,在“引用案例”附近(其中引用了第二个“引用”法院案件)附近识别文本,其指示引用原因(RFC) 。 指定文本的自动化方法(从一组引用文件中获取)作为与被引用文献的各个引用实例相关联的引用(RFC)的原因,具有以下步骤:获得各个引用文档中的引用实例的上下文( 每个上下文包括包括引用实例和引用实例附近的文本的文本),分析上下文的内容,以及基于所分析的上下文的内容来选择(来自引用实例的上下文)构成RFC的文本 。 相关的计算机自动化系统和方法选择与引用特定文档的原因高度相关的内容词,并给出它们表示其相对相关性的权重。 另一个相关的计算机自动化系统和方法形成词形态形式的列表。 另一个相关的计算机自动化系统和方法对句子进行分类,以表明它们与引用文档的原因的相关性。 另外,另一个相关的计算机自动化系统和方法生成内容词列表。 在一个优选实施例中,系统和方法适用于法律(特别是判例法)文件和法律(特别是判例法)引用。

    System and method for identifying facts and legal discussion in court case law documents
    6.
    发明授权
    System and method for identifying facts and legal discussion in court case law documents 有权
    法庭文件中确定事实和法律讨论的制度和方法

    公开(公告)号:US06772149B1

    公开(公告)日:2004-08-03

    申请号:US09401725

    申请日:1999-09-23

    IPC分类号: G06F1730

    摘要: A computer-implemented method of gathering large quantities of training data from case law documents (especially suitable for use as input to a learning algorithm that is used in a subsequent process of recognizing and distinguishing fact passages and discussion passages in additional case law documents) has steps of: partitioning text in the documents by headings in the documents, comparing the headings in the documents to fact headings in a fact heading list and to discussion headings in a discussion heading list, filtering from the documents the headings and text that is associated with the headings, and storing (on persistent storage in a manner adapted for input into the learning algorithm) fact training data and discussion training data that are based on the filtered headings and the associated text. Another method (of extracting features that are independent of specific machine learning algorithms needed to accurately classify case law text passages as fact passages or as discussion passages) has steps of: determining a relative position of the text passages in an opinion segment in the case law text, parsing the text passages into text chunks, comparing the text chunks to predetermined feature entities for possible matched feature entities, and associating the relative position and matched feature entities with the text passages for use by one of the learning algorithms. Corresponding apparatus and computer-readable memories are also provided.

    摘要翻译: 一种计算机实现的方法,从案例法文件中收集大量的培训数据(特别适合用作后续过程中用于识别和区分其他案例法文件中的事实段落和讨论段落的学习算法的输入) 步骤:通过文件中的标题来分隔文档中的文本,将文档中的标题与事实标题列表中的实际标题进行比较,并在讨论标题列表中与讨论标题进行比较,从文档过滤与标题相关的标题和文本 标题和以适合于输入到学习算法中的方式存储(在持久存储器上)基于经滤波的标题和相关文本的事实训练数据和讨论训练数据。 另一种方法(提取独立于特定机器学习算法的特征,将判例法文本段落准确分类为事实段落或作为讨论段落)具有以下步骤:确定判例法中意见段中文本段落的相对位置 文本,将文本段解析为文本块,将文本块与可能匹配特征实体的预定特征实体进行比较,以及将相对位置和匹配特征实体与文本段相关联以供学习算法之一使用。 还提供了相应的装置和计算机可读存储器。