Evaluating and generating summaries using normalized probabilities
    1.
    发明申请
    Evaluating and generating summaries using normalized probabilities 有权
    使用归一化概率评估和生成摘要

    公开(公告)号:US20070061356A1

    公开(公告)日:2007-03-15

    申请号:US11225861

    申请日:2005-09-13

    IPC分类号: G06F7/00

    摘要: A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the documents is provided. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.

    摘要翻译: 提供了一种用于根据部分文档的归一化概率来评估文档摘要和用于生成文档摘要的摘要系统。 摘要系统通过从文档模型导出的归一化概率选择摘要的句子来生成摘要。 评估系统基于从文档模型导出的摘要的归一化概率来评估摘要的有效性。

    Evaluating and generating summaries using normalized probabilities
    2.
    发明授权
    Evaluating and generating summaries using normalized probabilities 有权
    使用归一化概率评估和生成摘要

    公开(公告)号:US07565372B2

    公开(公告)日:2009-07-21

    申请号:US11225861

    申请日:2005-09-13

    IPC分类号: G06F17/00

    摘要: A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the document. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.

    摘要翻译: 基于文件部分的归一化概率来评估文件摘要和文档摘要的汇总系统。 摘要系统通过从文档模型导出的归一化概率选择摘要的句子来生成摘要。 评估系统基于从文档模型导出的摘要的归一化概率来评估摘要的有效性。

    Method and system for adapting search results to personal information needs
    3.
    发明授权
    Method and system for adapting search results to personal information needs 有权
    将搜索结果适应个人信息需求的方法和系统

    公开(公告)号:US07849089B2

    公开(公告)日:2010-12-07

    申请号:US12616739

    申请日:2009-11-11

    IPC分类号: G06F7/00 G10L15/00

    摘要: A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.

    摘要翻译: 提供了一种用于将查询的搜索结果适应于提交查询的用户的信息需求的方法和系统。 搜索系统分析点击三胞胎,指示用户提交了查询,并且用户从查询的结果中选择了文档。 为了克服点击数据的大尺寸和稀疏性,当呈现包括用户,查询和文档的输入三元组时,搜索系统确定用户将通过平滑点击数据来重新找到输入文档的概率, 通过三胞胎。 然后,搜索系统基于其对输入用户的重要性的概率来订购结果的文档。

    METHOD AND SYSTEM FOR ADAPTING SEARCH RESULTS TO PERSONAL INFORMATION NEEDS
    4.
    发明申请
    METHOD AND SYSTEM FOR ADAPTING SEARCH RESULTS TO PERSONAL INFORMATION NEEDS 有权
    搜索结果适用于个人信息需求的方法和系统

    公开(公告)号:US20100057798A1

    公开(公告)日:2010-03-04

    申请号:US12616739

    申请日:2009-11-11

    IPC分类号: G06F17/30

    摘要: A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.

    摘要翻译: 提供了一种用于将查询的搜索结果适应于提交查询的用户的信息需求的方法和系统。 搜索系统分析点击三胞胎,指示用户提交了查询,并且用户从查询的结果中选择了文档。 为了克服点击数据的大尺寸和稀疏性,搜索系统当呈现包括用户,查询和文档的输入三元组时,确定用户将通过平滑点击数据来重新找到输入文档的概率, 通过三胞胎。 然后,搜索系统基于其对输入用户的重要性的概率来订购结果的文档。

    Method and system for detecting when an outgoing communication contains certain content
    5.
    发明授权
    Method and system for detecting when an outgoing communication contains certain content 失效
    用于检测输出通信何时包含某些内容的方法和系统

    公开(公告)号:US07594277B2

    公开(公告)日:2009-09-22

    申请号:US10881867

    申请日:2004-06-30

    摘要: A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.

    摘要翻译: 提供一种用于检测输出通信是否包含机密信息或其他目标信息的方法和系统。 检测系统提供了一系列包含机密信息的文件,称为“机密文件”。 当向检测系统提供传出通信时,将传出通信的内容与机密文档的内容进行比较。 如果传出通信包含机密信息,则检测系统可以防止传出通信被发送到组织外部。 检测系统基于传出通信的内容与已知包含机密信息的机密文档的内容之间的相似性来检测机密信息。

    Method and system for classifying and identifying messages as question or not a question within a discussion thread
    6.
    发明授权
    Method and system for classifying and identifying messages as question or not a question within a discussion thread 失效
    用于将消息分类和识别为问题的方法和系统,或不是讨论线程中的问题

    公开(公告)号:US07590603B2

    公开(公告)日:2009-09-15

    申请号:US10957329

    申请日:2004-10-01

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707

    摘要: A method and system for classifying messages of a discussion thread as questions is provided. A classification system generates a classifier to classify messages of discussion threads as question messages or non-question messages. The system trains the classifier using the feature vectors and input classifications derived from a training set of discussion threads. After the classifier is trained, the classification system uses the classifier to classify messages within a corpus of discussion threads as question or non-question messages. To classify a message, the classification system generates a feature vector for the messages and submits that feature vector to the classifier. The classifier generates a score for the message indicating a likelihood that the message is a question message.

    摘要翻译: 提供了一种用于将讨论线程的消息分类为问题的方法和系统。 分类系统生成分类器以将讨论线程的消息分类为问题消息或非问题消息。 系统使用从训练集讨论线程派生的特征向量和输入分类来训练分类器。 在分类器训练之后,分类系统使用分类器将讨论线程的语料库中的消息分类为问题或非问题消息。 为了对消息进行分类,分类系统生成消息的特征向量,并将该特征向量提交给分类器。 分类器生成消息的分数,指示消息是问题消息的可能性。

    Clustering based text classification
    7.
    发明申请
    Clustering based text classification 有权
    基于聚类的文本分类

    公开(公告)号:US20050234955A1

    公开(公告)日:2005-10-20

    申请号:US10921477

    申请日:2004-08-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Systems and methods for clustering-based text classification are described. In one aspect text is clustered as a function of labeled data to generate cluster(s). The text includes the labeled data and unlabeled data. Expanded labeled data is then generated as a function of the cluster(s). The expanded label data includes the labeled data and at least a portion of unlabeled data. Discriminative classifier(s) are then trained based on the expanded labeled data and remaining ones of the unlabeled data.

    摘要翻译: 描述了基于聚类的文本分类的系统和方法。 在一个方面,文本被聚类为标记数据的函数以生成集群。 该文本包括标记数据和未标记数据。 然后根据集群生成扩展标签数据。 扩展的标签数据包括标记的数据和至少一部分未标记的数据。 然后基于扩展的标记数据和剩余的未标记数据来训练鉴别分类器。

    Content propagation for enhanced document retrieval
    8.
    发明申请
    Content propagation for enhanced document retrieval 失效
    增强文档检索的内容传播

    公开(公告)号:US20050234952A1

    公开(公告)日:2005-10-20

    申请号:US10826161

    申请日:2004-04-15

    IPC分类号: G06F19/00 G06F17/30

    摘要: Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.

    摘要翻译: 描述了提供用于增强文档检索的计算机实现的内容传播的系统和方法。 在一个方面,指定针对一个或多个文档的参考信息。 参考信息从一个或多个独立于包括一个或多个文档的数据源的数据来源识别。 从一个或多个数据来源提取近端位于参考信息的元数据。 计算元数据的各个特征与一个或多个文档中相关联的内容的相关性。 对于一个或多个文档的每个文档,将元数据的关联部分与来自相应部分的特征与文档的原始内容的相关性进行索引。 索引生成一个或多个增强文档。

    Method and system for prioritizing communications based on sentence classifications
    10.
    发明授权
    Method and system for prioritizing communications based on sentence classifications 有权
    基于句子分类优先通信的方法和系统

    公开(公告)号:US08112268B2

    公开(公告)日:2012-02-07

    申请号:US12254796

    申请日:2008-10-20

    IPC分类号: G06F17/28

    CPC分类号: G06F17/30

    摘要: A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.

    摘要翻译: 提供了一种基于通信内的句子分类来优先化通信的方法和系统。 句子分类系统可以根据诸如“句子模式”的各种分类对通信句进行分类。句子分类系统使用训练数据训练句子分类器,然后使用训练句子分类器对句子进行分类。 在对通信的句子进行分类之后,文档排序系统可以基于通信中的句子的分类来生成用于通信的等级。 文档排序系统使用训练数据训练文档排序分类器,然后使用经过训练的文档排序分类器来计算通信的等级。