专利检索 ap:("Wen-tau Yih" OR "Geoffrey G. Zweig" OR "John C. Platt") AND inv:"Wen-tau Yih" 第 1 页

1.

发明申请
DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS 审中-公开
标题翻译：确定定时矢量中的同步聚焦极化

公开(公告)号：US20140067368A1

公开(公告)日：2014-03-06

申请号：US13597277

申请日：2012-08-29

申请人： Wen-tau Yih , Geoffrey G. Zweig , John C. Platt

发明人： Wen-tau Yih , Geoffrey G. Zweig , John C. Platt

IPC分类号： G06F17/27

CPC分类号： G06F17/2795 , G06F16/3338 , G06F17/2785

摘要： A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.

摘要翻译： 可以基于语料库生成文档术语矩阵。可以基于基于语料库中包含的反义词信息修改文档项矩阵的多个元素来生成术语表示矩阵。可以基于术语表示矩阵的多个元素来确定相似度。

2.

发明申请
Learning Discriminative Projections for Text Similarity Measures 审中-公开
标题翻译：用于文本相似度量度的学习判别预测

公开(公告)号：US20120323968A1

公开(公告)日：2012-12-20

申请号：US13160485

申请日：2011-06-14

申请人： Wen-tau Yih , Kristina N. Toutanova , Christopher A. Meek , John C. Platt

发明人： Wen-tau Yih , Kristina N. Toutanova , Christopher A. Meek , John C. Platt

IPC分类号： G06F17/30

CPC分类号： G06F16/31

摘要： A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.

摘要翻译： 公开了将文本对象的原始文本表示映射到向量空间的模型。定义了一个功能，用于计算给定两个输出向量的相似度得分。定义了一种损失函数，用于计算基于相似度得分和向量对的标签的误差。调整模型的参数以最小化损失函数。两个向量的标签表示对象的相似度。标签可以是二进制数字或实数值。用于计算相似性分数的函数可以是余弦，Jaccard或可微分函数。损失函数可以将向量对与其标签进行比较。输出向量的每个元素是输入向量的项的线性或非线性函数。文本对象可以是不同类型的文档，并且可以同时训练两个不同的模型。

3.

发明授权
Classification using a cascade approach 失效
标题翻译：使用级联方法分类

公开(公告)号：US07693806B2

公开(公告)日：2010-04-06

申请号：US11766434

申请日：2007-06-21

申请人： Wen-tau Yih , Joshua T. Goodman , Geoffrey J. Hulten

发明人： Wen-tau Yih , Joshua T. Goodman , Geoffrey J. Hulten

IPC分类号： G06F15/18 , G06N3/08

CPC分类号： H04L51/12 , G06K9/6256 , G06Q10/06 , G06Q10/10

摘要： A system and method that facilitates and effectuates optimizing a classifier for greater performance in a specific region of classification that is of interest, such as a low false positive rate or a low false negative rate. A two-stage classification model can be trained and employed, where the first stage classification is optimized over the entire classification region and the second stage classifier is optimized for the specific region of interest. During training the entire set of training data is employed by a first stage classifier. Only data that is classified by the first stage classifier or by cross validation to fall within a region of interest is used to train the second stage classifier. During classification, data that is classified within the region of interest by the first classification is given the first stage classifier's classification value, otherwise the classification value for the instance of data from the second stage classifier is used.

摘要翻译： 促进并实现分类器在特定感兴趣区域中的更高性能的系统和方法，例如低假阳性率或低假阴性率。可以训练和采用两阶段分类模型，其中对整个分类区域优化第一阶段分类，并针对特定的兴趣区域优化第二阶段分类器。在训练期间，整套训练数据由第一阶段分类器采用。仅使用由第一阶段分类器分类的数据或通过交叉验证落入感兴趣区域内的数据来训练第二阶段分类器。在分类期间，通过第一分类对分类在感兴趣区域内的数据给予第一阶段分类器的分类值，否则使用来自第二阶段分类器的数据实例的分类值。

4.

发明授权
Consistent phrase relevance measures 有权
标题翻译：一致的短语相关性度量

公开(公告)号：US08290946B2

公开(公告)日：2012-10-16

申请号：US12144647

申请日：2008-06-24

申请人： Wen-tau Yih , Christopher A. Meek

发明人： Wen-tau Yih , Christopher A. Meek

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30687 , G06Q30/02

摘要： Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

摘要翻译： 描述了两种衡量关键字 - 文档相关性的方法。方法接收关键字和文档作为输入，并输出关键字的概率值。第一种方法是基于相似性的方法，其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。第二种方法是基于回归的方法，基于一个假设，如果文档外短语（关键字）在语义上类似于文档内短语，则文本内和外的短语的相关性分数应为彼此接近

5.

发明申请
Learning Element Weighting for Similarity Measures 有权
标题翻译：学习元素加权相似度量

公开(公告)号：US20110219012A1

公开(公告)日：2011-09-08

申请号：US12715417

申请日：2010-03-02

申请人： Wen-tau Yih , Christopher A. Meek , Hannaneh Hajishirzi

发明人： Wen-tau Yih , Christopher A. Meek , Hannaneh Hajishirzi

IPC分类号： G06F17/30 , G06F15/18

CPC分类号： G06F15/18 , G06F17/30

摘要： Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.

摘要翻译： 描述了一种用于通过从训练数据（例如标记的对象对）学习术语加权函数的框架来测量两个对象（例如，文档）之间的相似性的技术，以开发学习的模型。学习过程通过最小化相似性得分的定义的损失函数来调整模型参数。还描述了使用学习过程和学习模型来检测近似重复的文档。

6.

发明申请
CONSISTENT PHRASE RELEVANCE MEASURES 有权
标题翻译：一致性相关措施

公开(公告)号：US20090319508A1

公开(公告)日：2009-12-24

申请号：US12144647

申请日：2008-06-24

申请人： Wen-tau Yih , Christopher A. Meek

发明人： Wen-tau Yih , Christopher A. Meek

IPC分类号： G06F7/10 , G06F17/30

CPC分类号： G06F17/30687 , G06Q30/02

摘要： Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

摘要翻译： 描述了两种衡量关键字 - 文档相关性的方法。方法接收关键字和文档作为输入，并输出关键字的概率值。第一种方法是基于相似性的方法，其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。第二种方法是基于回归的方法，基于一个假设，如果文档外短语（关键字）在语义上类似于文档内短语，则文本内和外的短语的相关性分数应为彼此接近

7.

发明申请
Weighted linear model 审中-公开
标题翻译：加权线性模型

公开(公告)号：US20070083357A1

公开(公告)日：2007-04-12

申请号：US11485015

申请日：2006-07-12

申请人： Robert Moore , Wen-tau Yih , Galen Andrew , Kristina Toutanova

发明人： Robert Moore , Wen-tau Yih , Galen Andrew , Kristina Toutanova

IPC分类号： G06F17/28

CPC分类号： G06F17/2827 , G06F17/2836

摘要： A weighted linear word alignment model linearly combines weighted features to score a word alignment for a bilingual, aligned pair of text fragments. The features are each weighted by a feature weight. One of the features is a word association metric, which may be generated from surface statistics.

摘要翻译： 加权线性字对齐模型线性组合加权特征以对双语对齐的文本片段对进行字对齐。特征各自由特征权重加权。特征之一是字关联度量，其可以从表面统计量生成。

8.

发明申请
CLICKTHROUGH-BASED LATENT SEMANTIC MODEL 有权
标题翻译：基于CLICKTHROUGH的LATENT语义模型

公开(公告)号：US20130159320A1

公开(公告)日：2013-06-20

申请号：US13329345

申请日：2011-12-19

申请人： Jianfeng Gao , Kristina Toutanova , Wen-tau Yih

发明人： Jianfeng Gao , Kristina Toutanova , Wen-tau Yih

IPC分类号： G06F17/30

CPC分类号： G06F17/30867

摘要： There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.

摘要翻译： 提供了用于对文档进行排序的计算机实现的方法和系统。该方法包括基于多个文档的点击数据来识别多个查询文档对。该方法还包括基于查询文档对构建潜在语义模型，并根据潜在语义模型对搜索文档进行排序。

9.

发明授权
Web document keyword and phrase extraction 有权
标题翻译： Web文档关键字和短语提取

公开(公告)号：US08135728B2

公开(公告)日：2012-03-13

申请号：US11619230

申请日：2007-01-03

申请人： Wen-tau Yih , Joshua T. Goodman , Vitor Rocha de Carvalho

发明人： Wen-tau Yih , Joshua T. Goodman , Vitor Rocha de Carvalho

IPC分类号： G06F7/00 , G06F17/30 , G06F13/14

CPC分类号： G06F17/241 , G06F17/27 , G06F17/30 , G06F17/30616

摘要： Extraction analysis techniques biased, in part, by query frequency information from a query log file and/or search engine cache are employed along with machine learning processes to determine candidate keywords and/or phrases of web documents. Web oriented features associated with the candidate keywords and/or phrases are also utilized to analyze the web documents. A keyword and/or phrase extraction mechanism can be utilized to score keywords and/or phrases in a web document and estimate a likelihood that the keywords and/or phrases are relevant, for example, in an advertising system and the like.

摘要翻译： 提取分析技术部分地通过来自查询日志文件和/或搜索引擎高速缓冲存储器的查询频率信息以及机器学习过程来偏移来确定web文档的候选关键字和/或短语。与候选关键字和/或短语相关联的面向Web的功能也用于分析网络文档。可以使用关键字和/或短语提取机制来评估网络文档中的关键字和/或短语，并估计关键词和/或短语相关的可能性，例如在广告系统等中。

10.

发明申请
Document summarization by maximizing informative content words 有权
标题翻译：通过最大化信息内容词汇的文档摘要

公开(公告)号：US20080109425A1

公开(公告)日：2008-05-08

申请号：US11591937

申请日：2006-11-02

申请人： Wen-tau Yih , Joshua T. Goodman , Lucretia H. Vanderwende , Hisami Suzuki

发明人： Wen-tau Yih , Joshua T. Goodman , Lucretia H. Vanderwende , Hisami Suzuki

IPC分类号： G06F17/30 , G06F15/18 , G06F9/44

CPC分类号： G06F17/30719

摘要： Document summarization is performed by scoring individual words in sentences in a document or document cluster. Sentences from the document or document cluster are selected to form a summary based on the scores of the words contained in those sentences.

摘要翻译： 通过在文档或文档集群中的句子中的单个单词进行评分来执行文档摘要。选择文档或文档集合中的句子，以便根据这些句子中包含的单词的分数来形成一个摘要。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类