专利检索 ap:("C. MacLennan" OR "Hang Li" OR "Ming Zhou" OR "Yunbo Cao" OR "ZhaoHui Tang") AND inv:"Hang Li" 第 1 页

1.

发明申请
Text mining method 审中-公开
标题翻译：文本挖掘方法

公开(公告)号：US20050283357A1

公开(公告)日：2005-12-22

申请号：US10970586

申请日：2004-10-21

申请人： C. MacLennan , Hang Li , Ming Zhou , Yunbo Cao , ZhaoHui Tang

发明人： C. MacLennan , Hang Li , Ming Zhou , Yunbo Cao , ZhaoHui Tang

IPC分类号： G06F17/28 , G06F17/30

CPC分类号： G06F16/313

摘要： A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.

摘要翻译： 提供了一种执行数据挖掘的方法。该方法包括选择非结构化文本的至少一个数据源。此外，选择转换以识别非结构化文本中的术语列表。建立运行时路径以将数据源连接到转换，以将标识的术语列表加载到目标数据库中。

2.

发明申请
Electronic mail data cleaning 失效
标题翻译：电子邮件数据清理

公开(公告)号：US20070130263A1

公开(公告)日：2007-06-07

申请号：US11293469

申请日：2005-12-02

申请人： Hang Li , Yunbo Cao , ZhaoHui Tang

发明人： Hang Li , Yunbo Cao , ZhaoHui Tang

IPC分类号： G06F15/16

CPC分类号： G06Q10/107

摘要： A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

摘要翻译： 级联处理方法用于清理噪声电子邮件或其他短信数据。首先对嘈杂数据执行非文本过滤，以过滤掉数据中的非文本项。然后对已过滤的数据执行文本归一化，以提供清除的数据。清洁的数据可以用于各种其他应用或处理系统中的一种或多种。

3.

发明授权
Electronic mail data cleaning 失效
标题翻译：电子邮件数据清理

公开(公告)号：US07590608B2

公开(公告)日：2009-09-15

申请号：US11293469

申请日：2005-12-02

申请人： Hang Li , Yunbo Cao , ZhaoHui Tang

发明人： Hang Li , Yunbo Cao , ZhaoHui Tang

IPC分类号： G06N5/00 , G06F17/00

CPC分类号： G06Q10/107

摘要： A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

摘要翻译： 级联处理方法用于清理噪声电子邮件或其他短信数据。首先对嘈杂数据执行非文本过滤，以过滤掉数据中的非文本项。然后对已过滤的数据执行文本归一化，以提供清除的数据。清洁的数据可以用于各种其他应用或处理系统中的一种或多种。

4.

发明授权
Training a ranking component 有权
标题翻译：训练排名组成部分

公开(公告)号：US07783629B2

公开(公告)日：2010-08-24

申请号：US11326283

申请日：2006-01-05

申请人： Hang Li , Jianfeng Gao , Yunbo Cao

发明人： Hang Li , Jianfeng Gao , Yunbo Cao

IPC分类号： G06F17/30

CPC分类号： G06F17/30616

摘要： A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

摘要翻译： 从用户接收到查询和事实类型选择。访问基于事实的索引的段落索引，并检索与查询相关的段落，并且具有所选择的实例类型的段落。检索到的段落按照排列顺序根据计算得分排列并提供给用户。

5.

发明申请
Uncertainty reduction in collaborative bootstrapping 失效
标题翻译：协同自举的不确定性降低

公开(公告)号：US20050131850A1

公开(公告)日：2005-06-16

申请号：US10732741

申请日：2003-12-10

申请人： Yunbo Cao , Hang Li

发明人： Yunbo Cao , Hang Li

IPC分类号： G06F9/44 , G06F17/00 , G06N5/00 , G06N7/00 , G06N7/02 , G06N7/06 , G06N7/08

CPC分类号： G06N7/02

摘要： Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.

摘要翻译： 具有不确定性降低的协作引导，增加分类器性能。一个分类器选择相对于分类器不确定的一部分数据，而第二分类器标记该部分。不确定性减少包括并行处理，其中第二分类器还选择第一分类器标记的不确定部分。不确定度减少可以纳入现有的或新的共同训练或引导，包括双语引导。

6.

发明授权
Uncertainty reduction in collaborative bootstrapping 失效
标题翻译：协同自举的不确定性降低

公开(公告)号：US07512582B2

公开(公告)日：2009-03-31

申请号：US10732741

申请日：2003-12-10

申请人： Yunbo Cao , Hang Li

发明人： Yunbo Cao , Hang Li

IPC分类号： G06F9/44 , G06F17/00 , G06N7/02 , G06N7/06

CPC分类号： G06N7/02

摘要： Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.

摘要翻译： 具有不确定性降低的协作引导，增加分类器性能。一个分类器选择相对于分类器不确定的一部分数据，而第二分类器标记该部分。不确定性减少包括并行处理，其中第二分类器还选择第一分类器标记的不确定部分。不确定度减少可以纳入现有的或新的共同训练或引导，包括双语引导。

7.

发明申请
LEARNING A DOCUMENT RANKING USING A LOSS FUNCTION WITH A RANK PAIR OR A QUERY PARAMETER 有权
标题翻译：学习一个文件排序使用一个失败的功能与排名对或一个查询参数

公开(公告)号：US20080027925A1

公开(公告)日：2008-01-31

申请号：US11460838

申请日：2006-07-28

申请人： Hang Li , Jun Xu , Yunbo Cao , Tie-Yan Liu

发明人： Hang Li , Jun Xu , Yunbo Cao , Tie-Yan Liu

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99938 , Y10S707/99939

摘要： A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.

摘要翻译： 提供了一种用于生成用于将文档与查询的相关性排序的排序函数的方法和系统。排名系统从包括查询，结果文档以及每个文档与其查询的相关性的训练数据中学习排名函数。排名系统使用训练数据通过对相关文件的不正确排名加权比不相关文件的不正确排名更多地学习排名功能，以便更加重视正确排列相关文件。排序系统还可以通过将每个查询的贡献归一化到排序函数来学习使用训练数据的排序函数，使得它独立于每个查询的相关文档的数量。

8.

发明申请
Two stage search 有权
标题翻译：两级搜索

公开(公告)号：US20070112720A1

公开(公告)日：2007-05-17

申请号：US11273314

申请日：2005-11-14

申请人： Yunbo Cao , Hang Li

发明人： Yunbo Cao , Hang Li

IPC分类号： G06F17/30

CPC分类号： G06F17/30684

摘要： A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

摘要翻译： 两阶段模型识别在与查询相关的主题领域具有知识的个人。相关性模型接收查询并识别与查询相关的文档或其他信息。共同模型识别检索到的文档中与查询主题相关的个人。通过将来自相关性模型和同现模型的分数与排序顺序列表中的输出相结合，可以对所识别的个体进行评分。

9.

发明申请
Text mining apparatus and associated methods 有权
标题翻译：文字挖掘设备及相关方法

公开(公告)号：US20060206306A1

公开(公告)日：2006-09-14

申请号：US11054113

申请日：2005-02-09

申请人： Yunbo Cao , Hang Li , Olivier Ribet , Benjamin Martin

发明人： Yunbo Cao , Hang Li , Olivier Ribet , Benjamin Martin

IPC分类号： G06F17/28

CPC分类号： G06F17/2775 , G06F17/30616 , G06F17/30672 , Y10S707/99934

摘要： A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.

摘要翻译： 提供了一种提取用于文本挖掘的关键术语和相关关键词的方法。该方法包括接收非结构化文本文档，例如通过客户服务系统的电子邮件。基于识别满足上下文独立性阈值的连续字符串来提取术语候选。使用相互信息对术语候选者进行加权以生成加权项列表。然后重述加权条款。术语是基于卡方值。相关术语可用于信息检索。用户界面可以通过个人用户配置文件进行个性化。

10.

发明申请
System and method for managing information by answering a predetermined number of predefined questions 审中-公开
标题翻译：通过回答预定数量的预定义问题来管理信息的系统和方法

公开(公告)号：US20060047637A1

公开(公告)日：2006-03-02

申请号：US10932547

申请日：2004-09-02

申请人： Dmitriy Meyerzon , Hang Li , Joseph Sherman , Yunbo Cao , Zheng Chen

发明人： Dmitriy Meyerzon , Hang Li , Joseph Sherman , Yunbo Cao , Zheng Chen

IPC分类号： G06F17/30

CPC分类号： G06F16/3329 , G06F16/316 , G06F16/951

摘要： The present invention is a system for answering questions. The present invention uses a data mining module to mine data, such as enterprise data, and to configure the data to answer a predetermined number of questions each having a predefined form. The present invention also provides a user interface component for receiving user queries and responding to those queries.

摘要翻译： 本发明是用于回答问题的系统。本发明使用数据挖掘模块来挖掘诸如企业数据的数据，并且配置数据以回答每个具有预定义形式的预定数量的问题。本发明还提供了用于接收用户查询并响应于那些查询的用户界面组件。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类