Content matching
    11.
    发明授权
    Content matching 有权
    内容匹配

    公开(公告)号:US07574449B2

    公开(公告)日:2009-08-11

    申请号:US11292621

    申请日:2005-12-02

    申请人: Rangan Majumder

    发明人: Rangan Majumder

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3069 Y10S707/99943

    摘要: Various technologies and techniques are disclosed that improve the identification of related content. An article for which to identify matching content is received or selected. The raw text of the article is analyzed to reduce the raw text to a core set of words, and the results are stored in a document feature vector array. The formatted text of the article is analyzed and vector array scores are updated based on the formatting. Anchor text words for documents that link to the article are added to the vector array. Articles linking to and from the particular article are identified and added to the vector array as appropriate. Transformations are performed, such as to adjust the vector scores based on how common or generic the words are. Vector arrays are created for other potentially related documents. The vectors are compared to determine how related they are to each other.

    摘要翻译: 公开了改进相关内容的识别的各种技术和技术。 收到或选择用于识别匹配内容的文章。 分析文章的原始文本,将原始文本缩小为一组核心词,结果存储在文档特征向量数组中。 分析文章的格式化文本,并根据格式更新矢量数组分数。 链接到文章的文档的锚文本字添加到向量数组中。 与特定文章链接的文章被识别并适当添加到矢量数组中。 进行转换,例如根据单词的常用或泛型来调整向量分数。 为其他潜在相关文档创建向量数组。 比较向量以确定它们彼此之间的相关性。

    Content matching
    12.
    发明申请
    Content matching 有权
    内容匹配

    公开(公告)号:US20070130123A1

    公开(公告)日:2007-06-07

    申请号:US11292621

    申请日:2005-12-02

    申请人: Rangan Majumder

    发明人: Rangan Majumder

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3069 Y10S707/99943

    摘要: Various technologies and techniques are disclosed that improve the identification of related content. An article for which to identify matching content is received or selected. The raw text of the article is analyzed to reduce the raw text to a core set of words, and the results are stored in a document feature vector array. The formatted text of the article is analyzed and vector array scores are updated based on the formatting. Anchor text words for documents that link to the article are added to the vector array. Articles linking to and from the particular article are identified and added to the vector array as appropriate. Transformations are performed, such as to adjust the vector scores based on how common or generic the words are. Vector arrays are created for other potentially related documents. The vectors are compared to determine how related they are to each other.

    摘要翻译: 公开了改进相关内容的识别的各种技术和技术。 收到或选择用于识别匹配内容的文章。 分析文章的原始文本,将原始文本缩小为一组核心词,结果存储在文档特征向量数组中。 分析文章的格式化文本,并根据格式更新矢量数组分数。 链接到文章的文档的锚文本字添加到向量数组中。 与特定文章链接的文章被识别并适当添加到矢量数组中。 进行转换,例如根据单词的常用或泛型来调整向量分数。 为其他潜在相关文档创建向量数组。 比较向量以确定它们彼此之间的相关性。