Clustering based text classification
    31.
    发明授权
    Clustering based text classification 有权
    基于聚类的文本分类

    公开(公告)号:US07366705B2

    公开(公告)日:2008-04-29

    申请号:US10921477

    申请日:2004-08-16

    CPC分类号: G06F17/3071

    摘要: Systems and methods for clustering-based text classification are described. In one aspect text is clustered as a function of labeled data to generate cluster(s). The text includes the labeled data and unlabeled data. Expanded labeled data is then generated as a function of the cluster(s). The expanded label data includes the labeled data and at least a portion of unlabeled data. Discriminative classifier(s) are then trained based on the expanded labeled data and remaining ones of the unlabeled data.

    摘要翻译: 描述了基于聚类的文本分类的系统和方法。 在一个方面,文本被聚类为标记数据的函数以生成集群。 该文本包括标记数据和未标记数据。 然后根据集群生成扩展标签数据。 扩展的标签数据包括标记的数据和至少一部分未标记的数据。 然后基于扩展的标记数据和剩余的未标记数据来训练鉴别分类器。

    Evaluating and generating summaries using normalized probabilities
    32.
    发明申请
    Evaluating and generating summaries using normalized probabilities 有权
    使用归一化概率评估和生成摘要

    公开(公告)号:US20070061356A1

    公开(公告)日:2007-03-15

    申请号:US11225861

    申请日:2005-09-13

    IPC分类号: G06F7/00

    摘要: A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the documents is provided. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.

    摘要翻译: 提供了一种用于根据部分文档的归一化概率来评估文档摘要和用于生成文档摘要的摘要系统。 摘要系统通过从文档模型导出的归一化概率选择摘要的句子来生成摘要。 评估系统基于从文档模型导出的摘要的归一化概率来评估摘要的有效性。

    Augmenting user, query, and document triplets using singular value decomposition
    33.
    发明申请
    Augmenting user, query, and document triplets using singular value decomposition 失效
    使用奇异值分解增强用户,查询和文档三元组

    公开(公告)号:US20070055646A1

    公开(公告)日:2007-03-08

    申请号:US11222243

    申请日:2005-09-08

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675 G06F17/30864

    摘要: A system for augmenting click-through data with latent information present in the click-through data for use in generating search results that are better tailored to the information needs of a user submitting a query is provided. The augmentation system creates a three-dimensional matrix with the dimensions of users, queries, and documents. The augmentation system then performs a three-order singular value decomposition of the three-dimensional matrix to generate a three-dimensional core singular value matrix and a left singular matrix for each dimension. The augmentation system finally multiplies the three-dimensional core singular value matrix by the left singular matrices to generate an augmented three-dimensional matrix that explicitly contains the information that was latent in the un-augmented three-dimensional matrix.

    摘要翻译: 提供了一种用于通过存在于点击数据中的潜在信息来增强点击数据的系统,用于生成针对提交查询的用户的信息需求更好地定制的搜索结果。 增强系统创建一个具有用户,查询和文档尺寸的三维矩阵。 然后,增强系统执行三维矩阵的三阶奇异值分解,以产生每个维度的三维核心奇异值矩阵和左奇异矩阵。 增强系统最终将三维核心奇异值矩阵乘以左奇异矩阵,以生成明确包含未增强三维矩阵中潜在信息的增强三维矩阵。

    Method and system for identifying questions within a discussion thread
    35.
    发明申请
    Method and system for identifying questions within a discussion thread 失效
    用于在讨论线程中识别问题的方法和系统

    公开(公告)号:US20060112036A1

    公开(公告)日:2006-05-25

    申请号:US10957329

    申请日:2004-10-01

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707

    摘要: A method and system for classifying messages of a discussion thread as questions is provided. A classification system generates a classifier to classify messages of discussion threads as question messages or non-question messages. The system trains the classifier using the feature vectors and input classifications derived from a training set of discussion threads. After the classifier is trained, the classification system uses the classifier to classify messages within a corpus of discussion threads as question or non-question messages. To classify a message, the classification system generates a feature vector for the messages and submits that feature vector to the classifier. The classifier generates a score for the message indicating a likelihood that the message is a question message.

    摘要翻译: 提供了一种用于将讨论线程的消息分类为问题的方法和系统。 分类系统生成分类器以将讨论线程的消息分类为问题消息或非问题消息。 系统使用从训练集讨论线程派生的特征向量和输入分类来训练分类器。 在分类器训练之后,分类系统使用分类器将讨论线程的语料库中的消息分类为问题或非问题消息。 为了对消息进行分类,分类系统生成消息的特征向量,并将该特征向量提交给分类器。 分类器生成消息的分数,指示该消息是问题消息的可能性。

    Method and system for prioritizing communications based on sentence classifications

    公开(公告)号:US20060047497A1

    公开(公告)日:2006-03-02

    申请号:US10930687

    申请日:2004-08-31

    IPC分类号: G06F17/20

    CPC分类号: G06F17/30

    摘要: A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.

    Method and system for prioritizing communications based on interpersonal relationships
    37.
    发明申请
    Method and system for prioritizing communications based on interpersonal relationships 失效
    基于人际关系优先通信的方法和系统

    公开(公告)号:US20060026298A1

    公开(公告)日:2006-02-02

    申请号:US10903709

    申请日:2004-07-30

    IPC分类号: G06F15/173

    CPC分类号: G06Q10/107

    摘要: A method and system for calculating the importance of persons based on interpersonal relationships and prioritizing communications based on importance of participants in the communications is provided. A prioritization system identifies relationships between persons and identifies the importance of a person to other persons based on these relationships. After the prioritization system identifies the importance of persons, the prioritization system can prioritize communications based on the importance of the senders or recipients.

    摘要翻译: 提供了一种用于基于人际关系计算人的重要性的方法和系统,并且基于参与者在通信中的重要性来确定通信的优先级。 优先考虑系统确定人际关系,并根据这些关系确定一个人对其他人的重要性。 在优先考虑系统确定人员的重要性之后,优先级排序系统可以根据发件人或收件人的重要性对通信进行优先级排序。

    Method and system for ranking documents of a search result to improve diversity and information richness
    38.
    发明申请
    Method and system for ranking documents of a search result to improve diversity and information richness 失效
    搜索结果排序文件的方法和系统,以提高多样性和信息丰富度

    公开(公告)号:US20050246328A1

    公开(公告)日:2005-11-03

    申请号:US10837540

    申请日:2004-04-30

    IPC分类号: G06F17/30 G06F7/00

    摘要: A method and system for ranking documents of search results based on information richness and diversity of topics. A ranking system determines the information richness of each document within a search result. The ranking system groups documents of a search result based on their relatedness, meaning that they are directed to similar topics. The ranking system ranks the documents to ensure that the highest ranking documents may include at least one document covering each topic, that is, one document from each of the groups. The ranking system selects the document from each group that has the highest information richness of the documents within the group. When the documents are presented to a user in rank order, the user will likely find on the first page of the search result documents that cover a variety of topics, rather than just a single popular topic.

    摘要翻译: 基于信息丰富性和主题多样性对搜索结果文档进行排序的方法和系统。 排名系统确定搜索结果内每个文档的信息丰富度。 排名系统根据其相关性对搜索结果的文档进行分组,这意味着它们针对类似的主题。 排名系统排列文件,以确保最高排名的文档可能包含至少一个涵盖每个主题的文档,即每个组中的一个文档。 排名系统选择组内文件信息丰富度最高的组中的文档。 当文件以等级顺序呈现给用户时,用户很可能会在涵盖各种主题的搜索结果文档的第一页上找到,而不仅仅是一个流行的主题。

    Reinforced clustering of multi-type data objects for search term suggestion
    39.
    发明申请
    Reinforced clustering of multi-type data objects for search term suggestion 失效
    用于搜索词建议的多类型数据对象的加强聚类

    公开(公告)号:US20050234972A1

    公开(公告)日:2005-10-20

    申请号:US10826159

    申请日:2004-04-15

    IPC分类号: G06F17/30

    摘要: Systems and methods for related term suggestion are described. In one aspect, relationships among respective ones of two or more multi-type data objects are identified. The respective ones of the multi-type data objects include at least one object of a first type and at least one object of a second type that is different from the first type. The multi-type data objects are iteratively clustered in view of respective ones of the relationships to generate reinforced clusters.

    摘要翻译: 描述相关术语建议的系统和方法。 在一个方面,识别两个或多个多类型数据对象中的各个之间的关系。 多类型数据对象中的相应的一个包括第一类型的至少一个对象和与第一类型不同的第二类型的至少一个对象。 根据相应的关系,多重数据对象被迭代地聚集以产生增强的聚类。

    Method and system for clustering using generalized sentence patterns
    40.
    发明授权
    Method and system for clustering using generalized sentence patterns 有权
    使用广义句型进行聚类的方法和系统

    公开(公告)号:US07584100B2

    公开(公告)日:2009-09-01

    申请号:US10880662

    申请日:2004-06-30

    摘要: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.

    摘要翻译: 提供了一种基于文档主题的广义句子模式对文档进行聚类的方法和系统。 广义句型(“GSP”)系统识别描述文档主题的“句子”。 为了集群文件,GSP系统生成描述每个文档主题的句子的“广义句子”形式。 广义句是对句子的单词的抽象。 GSP系统根据其广义句子的模式识别文档簇。 GSP系统在其主题的广义句子表示具有相似模式时对文档进行聚类。