Text mining method
    1.
    发明申请
    Text mining method 审中-公开
    文本挖掘方法

    公开(公告)号:US20050283357A1

    公开(公告)日:2005-12-22

    申请号:US10970586

    申请日:2004-10-21

    IPC分类号: G06F17/28 G06F17/30

    CPC分类号: G06F16/313

    摘要: A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.

    摘要翻译: 提供了一种执行数据挖掘的方法。 该方法包括选择非结构化文本的至少一个数据源。 此外,选择转换以识别非结构化文本中的术语列表。 建立运行时路径以将数据源连接到转换,以将标识的术语列表加载到目标数据库中。

    Electronic mail data cleaning
    2.
    发明申请
    Electronic mail data cleaning 失效
    电子邮件数据清理

    公开(公告)号:US20070130263A1

    公开(公告)日:2007-06-07

    申请号:US11293469

    申请日:2005-12-02

    IPC分类号: G06F15/16

    CPC分类号: G06Q10/107

    摘要: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

    摘要翻译: 级联处理方法用于清理噪声电子邮件或其他短信数据。 首先对嘈杂数据执行非文本过滤,以过滤掉数据中的非文本项。 然后对已过滤的数据执行文本归一化,以提供清除的数据。 清洁的数据可以用于各种其他应用或处理系统中的一种或多种。

    Electronic mail data cleaning
    3.
    发明授权
    Electronic mail data cleaning 失效
    电子邮件数据清理

    公开(公告)号:US07590608B2

    公开(公告)日:2009-09-15

    申请号:US11293469

    申请日:2005-12-02

    IPC分类号: G06N5/00 G06F17/00

    CPC分类号: G06Q10/107

    摘要: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

    摘要翻译: 级联处理方法用于清理噪声电子邮件或其他短信数据。 首先对嘈杂数据执行非文本过滤,以过滤掉数据中的非文本项。 然后对已过滤的数据执行文本归一化,以提供清除的数据。 清洁的数据可以用于各种其他应用或处理系统中的一种或多种。

    Training a ranking component
    4.
    发明授权
    Training a ranking component 有权
    训练排名组成部分

    公开(公告)号:US07783629B2

    公开(公告)日:2010-08-24

    申请号:US11326283

    申请日:2006-01-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

    摘要翻译: 从用户接收到查询和事实类型选择。 访问基于事实的索引的段落索引,并检索与查询相关的段落,并且具有所选择的实例类型的段落。 检索到的段落按照排列顺序根据计算得分排列并提供给用户。

    Uncertainty reduction in collaborative bootstrapping
    5.
    发明申请
    Uncertainty reduction in collaborative bootstrapping 失效
    协同自举的不确定性降低

    公开(公告)号:US20050131850A1

    公开(公告)日:2005-06-16

    申请号:US10732741

    申请日:2003-12-10

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    CPC分类号: G06N7/02

    摘要: Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.

    摘要翻译: 具有不确定性降低的协作引导,增加分类器性能。 一个分类器选择相对于分类器不确定的一部分数据,而第二分类器标记该部分。 不确定性减少包括并行处理,其中第二分类器还选择第一分类器标记的不确定部分。 不确定度减少可以纳入现有的或新的共同训练或引导,包括双语引导。

    Uncertainty reduction in collaborative bootstrapping
    6.
    发明授权
    Uncertainty reduction in collaborative bootstrapping 失效
    协同自举的不确定性降低

    公开(公告)号:US07512582B2

    公开(公告)日:2009-03-31

    申请号:US10732741

    申请日:2003-12-10

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    CPC分类号: G06N7/02

    摘要: Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.

    摘要翻译: 具有不确定性降低的协作引导,增加分类器性能。 一个分类器选择相对于分类器不确定的一部分数据,而第二分类器标记该部分。 不确定性减少包括并行处理,其中第二分类器还选择第一分类器标记的不确定部分。 不确定度减少可以纳入现有的或新的共同训练或引导,包括双语引导。

    LEARNING A DOCUMENT RANKING USING A LOSS FUNCTION WITH A RANK PAIR OR A QUERY PARAMETER
    7.
    发明申请
    LEARNING A DOCUMENT RANKING USING A LOSS FUNCTION WITH A RANK PAIR OR A QUERY PARAMETER 有权
    学习一个文件排序使用一个失败的功能与排名对或一个查询参数

    公开(公告)号:US20080027925A1

    公开(公告)日:2008-01-31

    申请号:US11460838

    申请日:2006-07-28

    IPC分类号: G06F17/30

    摘要: A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.

    摘要翻译: 提供了一种用于生成用于将文档与查询的相关性排序的排序函数的方法和系统。 排名系统从包括查询,结果文档以及每个文档与其查询的相关性的训练数据中学习排名函数。 排名系统使用训练数据通过对相关文件的不正确排名加权比不相关文件的不正确排名更多地学习排名功能,以便更加重视正确排列相关文件。 排序系统还可以通过将每个查询的贡献归一化到排序函数来学习使用训练数据的排序函数,使得它独立于每个查询的相关文档的数量。

    Two stage search
    8.
    发明申请
    Two stage search 有权
    两级搜索

    公开(公告)号:US20070112720A1

    公开(公告)日:2007-05-17

    申请号:US11273314

    申请日:2005-11-14

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30684

    摘要: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

    摘要翻译: 两阶段模型识别在与查询相关的主题领域具有知识的个人。 相关性模型接收查询并识别与查询相关的文档或其他信息。 共同模型识别检索到的文档中与查询主题相关的个人。 通过将来自相关性模型和同现模型的分数与排序顺序列表中的输出相结合,可以对所识别的个体进行评分。

    Text mining apparatus and associated methods
    9.
    发明申请
    Text mining apparatus and associated methods 有权
    文字挖掘设备及相关方法

    公开(公告)号:US20060206306A1

    公开(公告)日:2006-09-14

    申请号:US11054113

    申请日:2005-02-09

    IPC分类号: G06F17/28

    摘要: A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.

    摘要翻译: 提供了一种提取用于文本挖掘的关键术语和相关关键词的方法。 该方法包括接收非结构化文本文档,例如通过客户服务系统的电子邮件。 基于识别满足上下文独立性阈值的连续字符串来提取术语候选。 使用相互信息对术语候选者进行加权以生成加权项列表。 然后重述加权条款。 术语是基于卡方值。 相关术语可用于信息检索。 用户界面可以通过个人用户配置文件进行个性化。