Verifying relevance between keywords and Web site contents
    51.
    发明申请
    Verifying relevance between keywords and Web site contents 失效
    验证关键字和网站内容之间的相关性

    公开(公告)号:US20050234953A1

    公开(公告)日:2005-10-20

    申请号:US10826162

    申请日:2004-04-15

    IPC分类号: G06F19/00 G06F17/30

    摘要: Systems and methods for verifying relevance between terms and Web site contents are described. In one aspect, site contents from a bid URL are retrieved. Expanded term(s) semantically and/or contextually related to bid term(s) are calculated. Content similarity and expanded similarity measurements are calculated from respective combinations of the bid term(s), the site contents, and the expanded terms. Category similarity measurements between the expanded terms and the site contents are determined in view of a trained similarity classifier. The trained similarity classifier having been trained from mined web site content associated with directory data. A confidence value providing an objective measure of relevance between the bid term(s) and the site contents is determined from the content, expanded, and category similarity measurements evaluating the multiple similarity scores in view of a trained relevance classifier model.

    摘要翻译: 描述了用于验证术语和网站内容之间的相关性的系统和方法。 一方面,检索出价网址中的网站内容。 计算语法上和/或与投标期相关的扩展术语。 内容相似性和扩展的相似度测量是根据投标条件,站点内容和扩展条款的各自组合计算的。 考虑到经过训练的相似性分类器,确定扩展术语和站点内容之间的类别相似度测量。 经过训练的相似性分类器已经从与目录数据相关联的挖掘的网站内容训练。 考虑到训练有素的相关性分类器模型,从评估多重相似度分数的内容,扩展和类别相似度测度中确定提供投标项和站点内容之间的相关性的客观量度的置信度值。

    Method and system for clustering using generalized sentence patterns
    53.
    发明授权
    Method and system for clustering using generalized sentence patterns 有权
    使用广义句型进行聚类的方法和系统

    公开(公告)号:US07584100B2

    公开(公告)日:2009-09-01

    申请号:US10880662

    申请日:2004-06-30

    摘要: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.

    摘要翻译: 提供了一种基于文档主题的广义句子模式对文档进行聚类的方法和系统。 广义句型(“GSP”)系统识别描述文档主题的“句子”。 为了集群文件,GSP系统生成描述每个文档主题的句子的“广义句子”形式。 广义句是对句子的单词的抽象。 GSP系统根据其广义句子的模式识别文档簇。 GSP系统在其主题的广义句子表示具有相似模式时对文档进行聚类。

    METHOD AND SYSTEM FOR CLASSIFYING DISPLAY PAGES USING SUMMARIES
    54.
    发明申请
    METHOD AND SYSTEM FOR CLASSIFYING DISPLAY PAGES USING SUMMARIES 审中-公开
    使用概要分类显示页的方法和系统

    公开(公告)号:US20090119284A1

    公开(公告)日:2009-05-07

    申请号:US12145222

    申请日:2008-06-24

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F16/345 G06F16/951

    摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.

    摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。

    Method and system for mining information based on relationships
    55.
    发明授权
    Method and system for mining information based on relationships 有权
    基于关系挖掘信息的方法和系统

    公开(公告)号:US07529735B2

    公开(公告)日:2009-05-05

    申请号:US11057100

    申请日:2005-02-11

    摘要: A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.

    摘要翻译: 提供了一种用于识别人的信息的方法和系统。 信息系统根据与文档的关系或更一般的对象来识别具有关系的人群。 信息系统最初被提供指示哪些人与哪些文档有哪些关系。 然后,信息系统基于与相同对象的关系来识别人群。 信息系统还可以识别与一群人相关联的相关对象的群集。 当用户想要识别关于某人的信息时,用户可以向该信息系统提供该人的姓名。 然后,信息系统可以检索和显示与该人在同一集群中的其他人的姓名。

    Method and system for ranking messages of discussion threads
    56.
    发明授权
    Method and system for ranking messages of discussion threads 有权
    讨论线索消息的方法和系统

    公开(公告)号:US07437382B2

    公开(公告)日:2008-10-14

    申请号:US11130803

    申请日:2005-05-16

    IPC分类号: G06F17/30

    摘要: A method and system for ranking messages of discussion threads based on relationships between messages and authors is provided. The ranking system defines an equation for attributes of a message and an author. The equations define the attribute values and are based on relationships between the attribute and the attributes associated with the same type of object, and different types of objects. The ranking system iteratively calculates the attribute values for the objects using the equations until the attribute values converge on a solution. The ranking system then ranks the messages based on attribute values.

    摘要翻译: 提供了一种基于消息和作者之间的关系对讨论线程的消息进行排序的方法和系统。 排名系统定义了消息和作者属性的方程式。 方程定义属性值,并且基于属性和与相同类型对象相关联的属性以及不同类型对象之间的关系。 排序系统使用等式迭代地计算对象的属性值,直到属性值收敛于解。 然后,排名系统根据属性值排列消息。

    Content propagation for enhanced document retrieval
    57.
    发明授权
    Content propagation for enhanced document retrieval 失效
    增强文档检索的内容传播

    公开(公告)号:US07305389B2

    公开(公告)日:2007-12-04

    申请号:US10826161

    申请日:2004-04-15

    IPC分类号: G06F17/30

    摘要: Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.

    摘要翻译: 描述了提供用于增强文档检索的计算机实现的内容传播的系统和方法。 在一个方面,指定针对一个或多个文档的参考信息。 参考信息从一个或多个独立于包括一个或多个文档的数据源的数据来源识别。 从一个或多个数据来源提取近端位于参考信息的元数据。 计算元数据的各个特征与一个或多个文档中相关联的内容的相关性。 对于一个或多个文档的每个文档,将元数据的相关部分与来自相应部分的特征与文档的原始内容的相关性进行索引。 索引生成一个或多个增强文档。

    Verifying relevance between keywords and web site contents
    59.
    发明授权
    Verifying relevance between keywords and web site contents 失效
    验证关键字和网站内容之间的相关性

    公开(公告)号:US07260568B2

    公开(公告)日:2007-08-21

    申请号:US10826162

    申请日:2004-04-15

    IPC分类号: G06F17/30

    摘要: Systems and methods for verifying relevance between terms and Web site contents are described. In one aspect, site contents from a bid URL are retrieved. Expanded term(s) semantically and/or contextually related to bid term(s) are calculated. Content similarity and expanded similarity measurements are calculated from respective combinations of the bid term(s), the site contents, and the expanded terms. Category similarity measurements between the expanded terms and the site contents are determined in view of a trained similarity classifier. The trained similarity classifier having been trained from mined web site content associated with directory data. A confidence value providing an objective measure of relevance between the bid term(s) and the site contents is determined from the content, expanded, and category similarity measurements evaluating the multiple similarity scores in view of a trained relevance classifier model.

    摘要翻译: 描述了用于验证术语和网站内容之间的相关性的系统和方法。 一方面,检索出价网址中的网站内容。 计算语法上和/或与投标期相关的扩展术语。 内容相似性和扩展的相似度测量是根据投标条件,站点内容和扩展条款的各自组合计算的。 考虑到经过训练的相似性分类器,确定扩展术语和站点内容之间的类别相似度测量。 经过训练的相似性分类器已经从与目录数据相关联的挖掘的网站内容训练。 考虑到训练有素的相关性分类器模型,从评估多重相似度分数的内容,扩展和类别相似度测度中确定提供投标项和站点内容之间的相关性的客观量度的置信度值。