Ranking and accessing definitions of terms
    31.
    发明授权
    Ranking and accessing definitions of terms 失效
    排名和访问术语的定义

    公开(公告)号:US07877383B2

    公开(公告)日:2011-01-25

    申请号:US11115500

    申请日:2005-04-27

    申请人: Yunbo Cao Hang Li Jun Xu

    发明人: Yunbo Cao Hang Li Jun Xu

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30654 G06F2216/03

    摘要: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.

    摘要翻译: 提供了处理信息的方法。 该方法包括从数据源收集定义候选的文本串。 定义候选人基于文本字符串进行排名。

    QUESTION AND ANSWER SEARCH
    32.
    发明申请
    QUESTION AND ANSWER SEARCH 审中-公开
    问题和答案搜索

    公开(公告)号:US20100235311A1

    公开(公告)日:2010-09-16

    申请号:US12403560

    申请日:2009-03-13

    IPC分类号: G06N5/02 G06F17/30

    CPC分类号: G06F16/9535

    摘要: Exemplary methods, computer-readable media, and systems are presented for leveraging question-answering knowledge from community sites by complementing product search services with a search of questions, answers, reviews and other Internet accessible content including user-generated content. Product or service information is obtained by crawling Internet-accessible Web sites including community sites. An integrated index of such information is generated. A user is able to browse questions by product or service feature, by topic, by identified comparative questions, and by question ranking (for example, interestingness or popularity).

    摘要翻译: 呈现示例性方法,计算机可读介质和系统,以通过对包括用户生成的内容的问题,答案,评论和其他因特网可访问内容的搜索来补充产品搜索服务来利用来自社区网站的问答答案。 产品或服务信息是通过抓取可访问Internet的网站(包括社区网站)获得的。 生成此类信息的综合索引。 用户能够通过产品或服务功能,主题,识别的比较问题以及问题排名(例如,趣味性或人气)来浏览问题。

    CLUSTERING QUESTION SEARCH RESULTS BASED ON TOPIC AND FOCUS
    33.
    发明申请
    CLUSTERING QUESTION SEARCH RESULTS BASED ON TOPIC AND FOCUS 有权
    基于主题和焦点的聚类问题搜索结果

    公开(公告)号:US20100030769A1

    公开(公告)日:2010-02-04

    申请号:US12185702

    申请日:2008-08-04

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F17/30696

    摘要: A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.

    摘要翻译: 提供了一种方法和系统,用于根据问题的集群和问题的聚焦集提出与查询问题相关的问题。 问题搜索系统提供了一系列问题。 集合的每个问题都有相关的主题和焦点。 在收到查询问题后,问题搜索系统识别可能与查询问题相关的集合问题,并产生指示所识别问题的相关性的分数或排名。 问题搜索系统将识别的问题集中到具有相似主题的主题问题集群中。 问题搜索系统还可以将每个主题集群中的问题集中到具有类似重点的问题焦点集群中。

    Learning a document ranking using a loss function with a rank pair or a query parameter
    34.
    发明授权
    Learning a document ranking using a loss function with a rank pair or a query parameter 有权
    使用具有排名对或查询参数的损失函数学习文档排名

    公开(公告)号:US07593934B2

    公开(公告)日:2009-09-22

    申请号:US11460838

    申请日:2006-07-28

    IPC分类号: G06F7/00 G06F17/30 G06F15/00

    摘要: A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.

    摘要翻译: 提供了一种用于生成用于将文档与查询的相关性排序的排序函数的方法和系统。 排名系统从包括查询,结果文档以及每个文档与其查询的相关性的训练数据中学习排名函数。 排名系统使用训练数据通过对相关文件的不正确排名加权比不相关文件的不正确排名更多地学习排名功能,以便更加重视正确排列相关文件。 排序系统还可以通过将每个查询的贡献归一化到排序函数来学习使用训练数据的排序函数,使得它独立于每个查询的相关文档的数量。

    Text mining apparatus and associated methods
    35.
    发明授权
    Text mining apparatus and associated methods 有权
    文字挖掘设备及相关方法

    公开(公告)号:US07461056B2

    公开(公告)日:2008-12-02

    申请号:US11054113

    申请日:2005-02-09

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.

    摘要翻译: 提供了一种提取用于文本挖掘的关键术语和相关关键词的方法。 该方法包括接收非结构化文本文档,例如通过客户服务系统的电子邮件。 基于识别满足上下文独立性阈值的连续字符串来提取术语候选。 使用相互信息对术语候选者进行加权以生成加权项列表。 然后重述加权条款。 术语是基于卡方值。 相关术语可用于信息检索。 用户界面可以通过个人用户配置文件进行个性化。

    Method and apparatus for browsing document content
    36.
    发明授权
    Method and apparatus for browsing document content 有权
    用于浏览文档内容的方法和装置

    公开(公告)号:US07284006B2

    公开(公告)日:2007-10-16

    申请号:US10714540

    申请日:2003-11-14

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    IPC分类号: G06F17/00

    摘要: A computer-implemented method is provided that includes receiving a document and determining a file type for the document. In addition, the document is segmented into blocks of text as a function of the file type and at least one keyword and a summary is generated for the document.

    摘要翻译: 提供了一种计算机实现的方法,其包括接收文档并确定文档的文件类型。 另外,根据文件类型将文档分割成文本块,并为文档生成至少一个关键字和摘要。

    Record linkage based on a trained blocking scheme
    37.
    发明授权
    Record linkage based on a trained blocking scheme 有权
    基于训练有素的阻塞方案记录链接

    公开(公告)号:US08843492B2

    公开(公告)日:2014-09-23

    申请号:US13372360

    申请日:2012-02-13

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30303

    摘要: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.

    摘要翻译: 本文公开的一些实施例提供了使用标记数据和未标记数据来训练阻塞方案的技术和布置。 例如,训练阻塞方案可以包括迭代地:学习连接,识别被标记的数据中的第一匹配以及由连接未被覆盖的未标记的数据,以及标识被标记的数据中的第二匹配和未标记的数据 连词。 在每个迭代中学到的连接可以使用分离来组合。 搜索引擎可以在搜索与实体匹配的记录时使用搜索引擎。

    RECORD LINKAGE BASED ON A TRAINED BLOCKING SCHEME
    38.
    发明申请
    RECORD LINKAGE BASED ON A TRAINED BLOCKING SCHEME 有权
    记录链接基于一个训练的阻塞方案

    公开(公告)号:US20130212103A1

    公开(公告)日:2013-08-15

    申请号:US13372360

    申请日:2012-02-13

    IPC分类号: G06F17/30 G06F15/18

    CPC分类号: G06F17/30303

    摘要: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.

    摘要翻译: 本文公开的一些实施例提供了使用标记数据和未标记数据来训练阻塞方案的技术和布置。 例如,训练阻塞方案可以包括迭代地:学习连接,识别被标记的数据中的第一匹配以及由连接未被覆盖的未标记的数据,以及标识被标记的数据中的第二匹配和未标记的数据 连词。 在每个迭代中学到的连接可以使用分离来组合。 搜索引擎可以在搜索与实体匹配的记录时使用搜索引擎。

    TWO STAGE SEARCH
    39.
    发明申请
    TWO STAGE SEARCH 有权
    两级搜索

    公开(公告)号:US20120109949A1

    公开(公告)日:2012-05-03

    申请号:US13343160

    申请日:2012-01-04

    申请人: Yunbo CAO Hang LI

    发明人: Yunbo CAO Hang LI

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30684

    摘要: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

    摘要翻译: 两阶段模型识别在与查询相关的主题领域具有知识的个人。 相关性模型接收查询并识别与查询相关的文档或其他信息。 共同模型识别检索到的文档中与查询主题相关的个人。 通过将来自相关性模型和同现模型的分数与排序顺序列表中的输出相结合,可以对所识别的个体进行评分。

    Question type-sensitive answer summarization
    40.
    发明授权
    Question type-sensitive answer summarization 有权
    问题类型敏感答案总结

    公开(公告)号:US07966316B2

    公开(公告)日:2011-06-21

    申请号:US12102866

    申请日:2008-04-15

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30707 G06F17/30719

    摘要: In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.

    摘要翻译: 在问答系统中,系统识别用户输入的问题类型。 系统然后生成答案摘要,其总结了基于用户询问的问题类型确定的格式的输入问题的答案。 答案摘要以相应的格式输出,以回答输入问题。