Text genre identification
    2.
    发明公开
    Text genre identification 失效
    文本流派标识

    公开(公告)号:EP0889417A3

    公开(公告)日:1999-11-24

    申请号:EP98305231.7

    申请日:1998-07-01

    Abstract: A processor implemented method of identifying the genre of a machine readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.

    Abstract translation: 处理器实现的识别机器可读,无标记文本的流派的方法。 处理器实现的方法首先从文本中生成一个提示矢量,该提示矢量表示第一组非易失性表面提示的文本中的出现,这些提示很容易计算。 之后,处理器使用提示矢量和与第一文本流派相关联的加权矢量来确定文本是否是第一文本流派的实例。

    Method of processing a corpus of electronically stored documents
    3.
    发明公开
    Method of processing a corpus of electronically stored documents 失效
    Verfahren zur Verarbeitung mehrerer elektronisch gespeicherte Dokumente。

    公开(公告)号:EP0631245A2

    公开(公告)日:1994-12-28

    申请号:EP94304471.9

    申请日:1994-06-20

    CPC classification number: G06F17/3071 G06F17/30011 Y10S707/99932

    Abstract: Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument (82) into a plurality of subsequent metadocuments (83,84,85,86). The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

    Abstract translation: 通过将具有至少一个初始元文件(82)的焦点集扩展到多个后续元文件(83,84,85,86)中来处理任意大的文档集合。 后续元文件的数量近似等于预定的最大数量。 随后的元文件然后被聚集成预定数量的新的元文件,其被汇总并呈现给用户。 焦点集被重新定义为仅包括用户选择的新的元文件。

    Automatic method of generating feature probabilities for automatic extracting summarization
    6.
    发明公开
    Automatic method of generating feature probabilities for automatic extracting summarization 失效
    生成用于自动提取摘要功能概率的自动方法

    公开(公告)号:EP0751470A1

    公开(公告)日:1997-01-02

    申请号:EP96304778.2

    申请日:1996-06-28

    CPC classification number: G06F17/30719

    Abstract: A method of automatically generating feature probabilities that allow later automatic generation of document extracts. The computer system generates the probabilities by analyzing each document a document at a time. First, the computer system designates one of the documents as a selected document. Next, the computer system analyzes each sentence of the selected document to determine the value of the paragraph feature and the value of the uppercase feature. The computer system repeats this effort for each document of the document corpus. Afterward, the number of occurrences of each value of each feature is calculated and is used to calculate feature value probabilities for all of the features.

    Abstract translation: 自动生成特征的概率的方法确实允许后自动生成文件提取物。 计算机系统基因利率同时分析每个文档的文档的概率。 首先,计算机系统指定文档作为一个选择的文档中的一个。 接着,计算机系统所选择的文档的每个句子分析,以确定矿段落特征的值和上壳体特征的值。 计算机系统重复这种努力的文档语料库的每个文档。 此后,每个特征的每个值的出现的次数被计算并用于计算特征值的概率的所有的特征。

    An iterative technique for phrase query formation and an information retrieval system employing same
    7.
    发明公开
    An iterative technique for phrase query formation and an information retrieval system employing same 失效
    迭代Verfahren zum Suchen von Satzteilen und Informationsauffindungssystem,welches diesesbenützt。

    公开(公告)号:EP0530993A2

    公开(公告)日:1993-03-10

    申请号:EP92307372.0

    申请日:1992-08-12

    CPC classification number: G06F17/30646 G06F17/30011 Y10S707/99934

    Abstract: An information retrieval system and method are provided in which an operator inputs (110) one or more query words which are used to determine a search key (120) for searching (130) through a corpus of documents, and which returns ( 140) any matches between the search key and the corpus of documents as a phrase containing the word data matching the search key (the query word(s)), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key ( 150, 160, 170) and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the " new" content words.

    Abstract translation: 提供了一种信息检索系统和方法,其中操作者输入(110)用于确定搜索关键字(120)的一个或多个查询词,用于通过文档语料库搜索(130),并返回(140)任何 将搜索关键字和文档语料库之间的匹配作为包含与搜索关键字(查询词)匹配的词数据的短语,与匹配字数据相邻的不间断(内容)字,以及所有中间停止 在匹配的字数据与下一个相邻的不停止字之间。 操作者在查看一个或多个所返回的短语之后可以使用下一个相邻的非停止词中的一个或多个作为新的查询词来重新形成搜索关键字(150,160,170),并且通过 文件语料库 该过程可以迭代进行,直到找到相关文档。 来自每个短语的附加非停止词优选彼此对齐(例如,通过列),以便于观看“新”内容词。

    Text genre identification
    9.
    发明公开
    Text genre identification 失效
    Textgenreerkennung

    公开(公告)号:EP0889417A2

    公开(公告)日:1999-01-07

    申请号:EP98305231.7

    申请日:1998-07-01

    Abstract: A processor implemented method of identifying the genre of a machine readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.

    Abstract translation: 一种处理器实现的方法,用于识别机器可读,未标记的文本的类型。 处理器实现的方法开始于从文本生成提示向量,其代表第一组非结构化表面线索的文本中的出现,其易于计算。 之后,处理器确定文本是否是使用提示向量的第一文本类型的实例以及与第一文本类型相关联的加权向量。

Patent Agency Ranking