Finite-state transduction of related word forms for text indexing and retrieval
    11.
    发明公开
    Finite-state transduction of related word forms for text indexing and retrieval 失效
    有限状态转换文本索引和检索的相关字词形式

    公开(公告)号:EP0583083A3

    公开(公告)日:1994-09-07

    申请号:EP93305626.9

    申请日:1993-07-19

    Abstract: The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers (FSTs) to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The merged FST (70) may be produced by simultaneously intersecting (&) and composing (o) a lexicon transducer (65) and a number of rule transducers (61-63). Although the resulting FSTs can have many states and transitions or arcs, they can be compacted by finite- state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel FST (70) as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.

    Finite-state transduction of related word forms for text indexing and retrieval
    12.
    发明公开
    Finite-state transduction of related word forms for text indexing and retrieval 失效
    形成的利用有限自动机的文本索引和检索相关的单词形式。

    公开(公告)号:EP0583083A2

    公开(公告)日:1994-02-16

    申请号:EP93305626.9

    申请日:1993-07-19

    Abstract: The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers (FSTs) to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The merged FST (70) may be produced by simultaneously intersecting (&) and composing (o) a lexicon transducer (65) and a number of rule transducers (61-63). Although the resulting FSTs can have many states and transitions or arcs, they can be compacted by finite- state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel FST (70) as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.

    Abstract translation: 本发明解决了在使用了一些问题茎中的自然语言文档全文检索(字含义的典型指标),并因此允许回顾在不牺牲精度的提高。 它使用有限状态变换器(FSTS)的各种布置,以精确地设置编码若干映射的较佳方式来回字之间和茎,并考虑到由字字语言的形态规则系统的两个系统的方面,因此在 所以没有违规何时。 合并后的FST(70)可以通过同时相交(&)和构成(O)的词典换能器(65)和多个规则的换能器(61-63)来制备。 虽然产生的FSTS可以有很多状态和转换或弧,它们可以通过有限状态的压缩算法进行压缩,以便thatthey可以有效地在资源有限的应用中使用。 本发明考虑的信息检索系统,其包括新颖的FST(70)作为一个数据库和用于响应用户查询,搜索数据库中的一个处理器,以及用于输出铃声正确的响应,如果它们的存在,以及在搜索中使用的新颖的数据库 的系统和用于构造新的数据库的方法。

    Automatic method of extracting summarization using feature probabilities
    15.
    发明公开
    Automatic method of extracting summarization using feature probabilities 失效
    Automatische Methode zur Extraktionszusammenfassung durch Gebrauch von Merkmal-Wahrscheinlichkeiten

    公开(公告)号:EP0751469A1

    公开(公告)日:1997-01-02

    申请号:EP96304777.4

    申请日:1996-06-28

    CPC classification number: G06F17/30719

    Abstract: A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.

    Abstract translation: 自动生成文档提取的方法。 该方法利用从手动生成的摘要的统计分析生成的特征值概率来提取专家可能的同一组句子。 该方法基于迭代方法。 首先,计算机系统将文档的句子指定为所选择的句子。 第二,计算机系统确定特征集的每个特征的所选择的句子的值。 第三,计算机系统基于所选择的句子的特征值以及与该值相关联的概率来增加所选句子的得分。 第四,在对计算机系统的文档的所有句子进行评分之后,计算机系统选择要提取的最高得分句子的子集。

    A cluster-based method and system for browsing large document collections
    16.
    发明公开
    A cluster-based method and system for browsing large document collections 失效
    基于群集的浏览大型文档集合的方法和系统

    公开(公告)号:EP0542429A3

    公开(公告)日:1994-04-27

    申请号:EP92309402.3

    申请日:1992-10-15

    Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non- time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

    Abstract translation: Scatter-Gather是一种基于计算机的文档浏览方法,其在时间上与目标语料库中的多个文档成比例地操作。 分散聚集方法包括:使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式效用的语料库的初始排序的总结; 并使用例如在线非确定性方法提供语料库的进一步排序。 脱机准备语料库的初始排序步骤是非时间依赖的,因此准备了准确的初始排序。 确定摘要的步骤包括确定用于呈现给用户而不用在CRT上滚动的摘要。 提供进一步排序的步骤包括截断群平均聚集聚类,合并不相关文档集,中心查找,分配到最近和其他细化方法。

    Electronic document processing systems
    17.
    发明公开
    Electronic document processing systems 失效
    电子文件处理系统

    公开(公告)号:EP0459792A3

    公开(公告)日:1993-08-04

    申请号:EP91304879.9

    申请日:1991-05-30

    Abstract: Provision is made in electronic document processing systems for printing unfiltered or filtered machine-readable digital representations of electronic documents, and human-readable renderings of them on the same record medium using the same printing process. The integration of machine-readable digital representations of electronic documents with the human-readable hardcopy renderings of them may be employed, for example, not only to enhance the precision with which the structure and content of such electronic documents can be recovered by scanning such hardcopies into electronic document processing systems, but also as a mechanism for enabling recipients of scanned-in versions of such documents to identify and process annotations that were added to the hardcopies after they were printed and/or for alerting the recipients of the scanned-in documents to alterations that may have been made to the original human-readable content of the hardcopy renderings. In addition to storage of the electronic representation of the document, provision is made for encoding information about the electronic representation of the document itself, such as file name, creation and modification dates, access and security information, printing histories. Provision is also made for encoding information which is computed from the content of the document and other information, for purposes of authentication and verification of document integrity. Provision is also made for the encoding of information which relates to operations which are to be performed depending on handwritten marks made upon a hardcopy rendering of the document; for example, encoding instructions of what action is to be taken when a box on a document is checked. Provision is also made for encoding in the hardcopy another class of information: information about the rendering of the document specific to that hard copy, which can include a numbered copy of that print, the identification of the machine which performed that print, the reproduction characteristics of the printer, the screen frequency and rotation used by the printer in rendering halftones. Provision is also made for encoding information about the digital encoding mechanism itself, such as information given in standard-encoded headers about subsequently compressed or encrypted digital information.

    A cluster-based method and system for browsing large document collections
    19.
    发明公开
    A cluster-based method and system for browsing large document collections 失效
    Gruppenbasiertes Verfahren und System,um grosse Dokumentsammlungen anzuschauen

    公开(公告)号:EP0980043A2

    公开(公告)日:2000-02-16

    申请号:EP99203801.8

    申请日:1992-10-15

    Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

    Abstract translation: Scatter-Gather是一种基于计算机的文档浏览方法,与目标语料库中的文档数量成正比。 分散收集方法包括:使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式实用程序的语料库的初始排序的摘要; 并使用例如在线非确定性方法提供语料库的进一步排序。 离线准备语料库的初始排序的步骤是非时间依赖的,因此准备了准确的初始排序。 确定摘要的步骤包括确定用于呈现给用户的摘要,而不在CRT上滚动。 提供进一步排序的步骤包括截断组平均聚集聚类,合并不相关文档集合,中心查找,分配到最近和其他细化方法。

Patent Agency Ranking