Method and apparatus for generating a language independent document abstract
    1.
    发明申请
    Method and apparatus for generating a language independent document abstract 失效
    用于生成语言无关文档摘要的方法和装置

    公开(公告)号:US20050119873A1

    公开(公告)日:2005-06-02

    申请号:US11018045

    申请日:2004-12-21

    IPC分类号: G06F17/30 G06F17/20

    CPC分类号: G06F17/3061

    摘要: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.

    摘要翻译: 从存储在计算机可读介质中的一个或多个文档中提取重要短语的方法。 从一个或多个文档读取一系列单词,并且基于该单词的长度为该序列中的每个单词确定分数。 将序列中每个单词的得分与阈值得分进行比较。 如果具有大于阈值分数的分数的序列中的单词数等于或超过预定数目,则词语序列被指示为重要短语。 如果单词的序列是重要短语,则从文档中检索包含单词序列的句子。 搜索文档的摘要以确定该句子以前是否已包含在摘要中。 如果没有,则将该句添加到摘要中。