Method and system for classifying display pages using summaries
    1.
    发明授权
    Method and system for classifying display pages using summaries 有权
    使用汇总分类显示页面的方法和系统

    公开(公告)号:US07392474B2

    公开(公告)日:2008-06-24

    申请号:US10836319

    申请日:2004-04-30

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30719 G06F17/30864

    摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.

    摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。

    Method and system for summarizing a document
    2.
    发明授权
    Method and system for summarizing a document 有权
    汇总文件的方法和系统

    公开(公告)号:US07698339B2

    公开(公告)日:2010-04-13

    申请号:US10918242

    申请日:2004-08-13

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705 G06F17/30719

    摘要: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.

    摘要翻译: 提供了一种用于计算文档中句子的重要性的方法和系统。 总结系统计算文档的句子的重要性,并选择最重要的句子作为文档的摘要。 总结系统根据文本中包含的“重要”字来计算句子的意义。 总结系统使用各种评分技术计算文档的单词的重要性,然后将分数组合成一个单词重要或不重要。 然后,总结系统可以用于基于句子包含的重要词语来识别文档的重要句子,并且将重要句子作为文档的摘要来选择。

    Method and system for summarizing a document
    3.
    发明申请
    Method and system for summarizing a document 有权
    汇总文件的方法和系统

    公开(公告)号:US20060036596A1

    公开(公告)日:2006-02-16

    申请号:US10918242

    申请日:2004-08-13

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705 G06F17/30719

    摘要: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.

    摘要翻译: 提供了一种用于计算文档中句子的重要性的方法和系统。 总结系统计算文档的句子的重要性,并选择最重要的句子作为文档的摘要。 总结系统根据文本中包含的“重要”字来计算句子的意义。 总结系统使用各种评分技术计算文档的单词的重要性,然后将分数组合成一个单词重要或不重要。 然后,总结系统可以用于基于句子包含的重要词语来识别文档的重要句子,并且将重要句子作为文档的摘要来选择。

    Method and system for classifying display pages using summaries
    4.
    发明申请
    Method and system for classifying display pages using summaries 有权
    使用汇总分类显示页面的方法和系统

    公开(公告)号:US20050246410A1

    公开(公告)日:2005-11-03

    申请号:US10836319

    申请日:2004-04-30

    IPC分类号: G06F17/30 G06F15/16

    CPC分类号: G06F17/30719 G06F17/30864

    摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.

    摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。

    METHOD AND SYSTEM FOR CLASSIFYING DISPLAY PAGES USING SUMMARIES
    5.
    发明申请
    METHOD AND SYSTEM FOR CLASSIFYING DISPLAY PAGES USING SUMMARIES 审中-公开
    使用概要分类显示页的方法和系统

    公开(公告)号:US20090119284A1

    公开(公告)日:2009-05-07

    申请号:US12145222

    申请日:2008-06-24

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F16/345 G06F16/951

    摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.

    摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。

    Method and system for adapting search results to personal information needs
    6.
    发明授权
    Method and system for adapting search results to personal information needs 有权
    将搜索结果适应个人信息需求的方法和系统

    公开(公告)号:US07849089B2

    公开(公告)日:2010-12-07

    申请号:US12616739

    申请日:2009-11-11

    IPC分类号: G06F7/00 G10L15/00

    摘要: A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.

    摘要翻译: 提供了一种用于将查询的搜索结果适应于提交查询的用户的信息需求的方法和系统。 搜索系统分析点击三胞胎,指示用户提交了查询,并且用户从查询的结果中选择了文档。 为了克服点击数据的大尺寸和稀疏性,当呈现包括用户,查询和文档的输入三元组时,搜索系统确定用户将通过平滑点击数据来重新找到输入文档的概率, 通过三胞胎。 然后,搜索系统基于其对输入用户的重要性的概率来订购结果的文档。

    METHOD AND SYSTEM FOR ADAPTING SEARCH RESULTS TO PERSONAL INFORMATION NEEDS
    7.
    发明申请
    METHOD AND SYSTEM FOR ADAPTING SEARCH RESULTS TO PERSONAL INFORMATION NEEDS 有权
    搜索结果适用于个人信息需求的方法和系统

    公开(公告)号:US20100057798A1

    公开(公告)日:2010-03-04

    申请号:US12616739

    申请日:2009-11-11

    IPC分类号: G06F17/30

    摘要: A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.

    摘要翻译: 提供了一种用于将查询的搜索结果适应于提交查询的用户的信息需求的方法和系统。 搜索系统分析点击三胞胎,指示用户提交了查询,并且用户从查询的结果中选择了文档。 为了克服点击数据的大尺寸和稀疏性,搜索系统当呈现包括用户,查询和文档的输入三元组时,确定用户将通过平滑点击数据来重新找到输入文档的概率, 通过三胞胎。 然后,搜索系统基于其对输入用户的重要性的概率来订购结果的文档。

    Method and system for detecting when an outgoing communication contains certain content
    8.
    发明授权
    Method and system for detecting when an outgoing communication contains certain content 失效
    用于检测输出通信何时包含某些内容的方法和系统

    公开(公告)号:US07594277B2

    公开(公告)日:2009-09-22

    申请号:US10881867

    申请日:2004-06-30

    摘要: A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.

    摘要翻译: 提供一种用于检测输出通信是否包含机密信息或其他目标信息的方法和系统。 检测系统提供了一系列包含机密信息的文件,称为“机密文件”。 当向检测系统提供传出通信时,将传出通信的内容与机密文档的内容进行比较。 如果传出通信包含机密信息,则检测系统可以防止传出通信被发送到组织外部。 检测系统基于传出通信的内容与已知包含机密信息的机密文档的内容之间的相似性来检测机密信息。

    Method and system for classifying and identifying messages as question or not a question within a discussion thread
    9.
    发明授权
    Method and system for classifying and identifying messages as question or not a question within a discussion thread 失效
    用于将消息分类和识别为问题的方法和系统,或不是讨论线程中的问题

    公开(公告)号:US07590603B2

    公开(公告)日:2009-09-15

    申请号:US10957329

    申请日:2004-10-01

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707

    摘要: A method and system for classifying messages of a discussion thread as questions is provided. A classification system generates a classifier to classify messages of discussion threads as question messages or non-question messages. The system trains the classifier using the feature vectors and input classifications derived from a training set of discussion threads. After the classifier is trained, the classification system uses the classifier to classify messages within a corpus of discussion threads as question or non-question messages. To classify a message, the classification system generates a feature vector for the messages and submits that feature vector to the classifier. The classifier generates a score for the message indicating a likelihood that the message is a question message.

    摘要翻译: 提供了一种用于将讨论线程的消息分类为问题的方法和系统。 分类系统生成分类器以将讨论线程的消息分类为问题消息或非问题消息。 系统使用从训练集讨论线程派生的特征向量和输入分类来训练分类器。 在分类器训练之后,分类系统使用分类器将讨论线程的语料库中的消息分类为问题或非问题消息。 为了对消息进行分类,分类系统生成消息的特征向量,并将该特征向量提交给分类器。 分类器生成消息的分数,指示消息是问题消息的可能性。

    Clustering based text classification
    10.
    发明申请
    Clustering based text classification 有权
    基于聚类的文本分类

    公开(公告)号:US20050234955A1

    公开(公告)日:2005-10-20

    申请号:US10921477

    申请日:2004-08-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Systems and methods for clustering-based text classification are described. In one aspect text is clustered as a function of labeled data to generate cluster(s). The text includes the labeled data and unlabeled data. Expanded labeled data is then generated as a function of the cluster(s). The expanded label data includes the labeled data and at least a portion of unlabeled data. Discriminative classifier(s) are then trained based on the expanded labeled data and remaining ones of the unlabeled data.

    摘要翻译: 描述了基于聚类的文本分类的系统和方法。 在一个方面,文本被聚类为标记数据的函数以生成集群。 该文本包括标记数据和未标记数据。 然后根据集群生成扩展标签数据。 扩展的标签数据包括标记的数据和至少一部分未标记的数据。 然后基于扩展的标记数据和剩余的未标记数据来训练鉴别分类器。