-
公开(公告)号:US20090240498A1
公开(公告)日:2009-09-24
申请号:US12051183
申请日:2008-03-19
IPC分类号: G10L15/08
CPC分类号: G06F17/2211 , G06F16/35
摘要: Systems and methods to perform short text segment similarity measures. Illustratively, a short text segment similarity environment comprises a short text engine operative to process data representative of short segments of text and an instruction set comprising at least one instruction to instruct the short text engine to process data representative of short text segment inputs according to a selected short text similarity identification paradigm. Illustratively, two or more short text segments can be received as input by the short text engine and a request to identify similarities among the two or more short text segments. Responsive to the request and data input, the short text engine executes a selected similarity identification technique in accordance with the sort text similarity identification paradigm to process the received data and to identify similarities between the short text segment inputs.
摘要翻译: 执行短文本段相似性度量的系统和方法。 示例性地,短文本段相似性环境包括用于处理代表短段文本的数据的短文本引擎和包括至少一个指令的指令集,以指示短文本引擎根据以下内容来处理代表短文本段输入的数据 选择短文本相似性识别范式。 说明性地,可以接收短文本引擎的两个或多个短文本段作为输入,以及用于标识两个或更多个短文本段之间的相似性的请求。 响应于请求和数据输入,短文本引擎根据排序文本相似性识别范例来执行所选择的相似性识别技术,以处理接收到的数据并识别短文本段输入之间的相似性。
-
2.
公开(公告)号:US20120323968A1
公开(公告)日:2012-12-20
申请号:US13160485
申请日:2011-06-14
IPC分类号: G06F17/30
CPC分类号: G06F16/31
摘要: A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.
摘要翻译: 公开了将文本对象的原始文本表示映射到向量空间的模型。 定义了一个功能,用于计算给定两个输出向量的相似度得分。 定义了一种损失函数,用于计算基于相似度得分和向量对的标签的误差。 调整模型的参数以最小化损失函数。 两个向量的标签表示对象的相似度。 标签可以是二进制数字或实数值。 用于计算相似性分数的函数可以是余弦,Jaccard或可微分函数。 损失函数可以将向量对与其标签进行比较。 输出向量的每个元素是输入向量的项的线性或非线性函数。 文本对象可以是不同类型的文档,并且可以同时训练两个不同的模型。
-
公开(公告)号:US08996515B2
公开(公告)日:2015-03-31
申请号:US13609257
申请日:2012-09-11
申请人: Wen-tau Yih , Christopher A. Meek
发明人: Wen-tau Yih , Christopher A. Meek
CPC分类号: G06F17/30687 , G06Q30/02
摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近
-
公开(公告)号:US20120330978A1
公开(公告)日:2012-12-27
申请号:US13609257
申请日:2012-09-11
申请人: Wen-tau Yih , Christopher A. Meek
发明人: Wen-tau Yih , Christopher A. Meek
IPC分类号: G06F17/30
CPC分类号: G06F17/30687 , G06Q30/02
摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近
-
公开(公告)号:US08290946B2
公开(公告)日:2012-10-16
申请号:US12144647
申请日:2008-06-24
申请人: Wen-tau Yih , Christopher A. Meek
发明人: Wen-tau Yih , Christopher A. Meek
CPC分类号: G06F17/30687 , G06Q30/02
摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近
-
公开(公告)号:US20110219012A1
公开(公告)日:2011-09-08
申请号:US12715417
申请日:2010-03-02
摘要: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.
摘要翻译: 描述了一种用于通过从训练数据(例如标记的对象对)学习术语加权函数的框架来测量两个对象(例如,文档)之间的相似性的技术,以开发学习的模型。 学习过程通过最小化相似性得分的定义的损失函数来调整模型参数。 还描述了使用学习过程和学习模型来检测近似重复的文档。
-
公开(公告)号:US20090319508A1
公开(公告)日:2009-12-24
申请号:US12144647
申请日:2008-06-24
申请人: Wen-tau Yih , Christopher A. Meek
发明人: Wen-tau Yih , Christopher A. Meek
CPC分类号: G06F17/30687 , G06Q30/02
摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近
-
公开(公告)号:US09183173B2
公开(公告)日:2015-11-10
申请号:US12715417
申请日:2010-03-02
摘要: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.
摘要翻译: 描述了一种用于通过从训练数据(例如标记的对象对)学习术语加权函数的框架来测量两个对象(例如,文档)之间的相似性的技术,以开发学习的模型。 学习过程通过最小化相似性得分的定义的损失函数来调整模型参数。 还描述了使用学习过程和学习模型来检测近似重复的文档。
-
公开(公告)号:US09283476B2
公开(公告)日:2016-03-15
申请号:US11843468
申请日:2007-08-22
申请人: Anton Mityagin , Aparna Lakshmiratan , Asela J. Gunawardana , Christopher A. Meek , David M. Chickering , Paul N. Bennett , Timothy S. Paek
发明人: Anton Mityagin , Aparna Lakshmiratan , Asela J. Gunawardana , Christopher A. Meek , David M. Chickering , Paul N. Bennett , Timothy S. Paek
CPC分类号: A63F13/798 , A63F13/12 , A63F13/23 , A63F13/33 , A63F13/46 , A63F13/75 , A63F13/85 , A63F2300/558 , A63F2300/5586 , G06Q10/10 , G06Q30/02 , G07F17/3225 , G07F17/3244
摘要: Systems and methods allow an on-line game to extract information relevant to a specific need of a game platform or service platform. The specific need relates to management and use of digital content, and is addressed by designing and playing an on-line collaborative game. The rules of the game intend to solve a specific task dictated by the specific need. Players' responses to the game generate a wealth of information related to a specific task objective, such as ranking, sorting, and evaluating a set of digital content items. To compel participation in a game, players can be rewarded with monetary value rewards. As a game illustration, an image selection game (ISG) that exploits human contextual inference is described in detail. The information extracted from ISG is a list of key-image associations, relevant for the task of image sorting and ranking.
摘要翻译: 系统和方法允许在线游戏提取与游戏平台或服务平台的特定需要相关的信息。 具体需要涉及数字内容的管理和使用,并通过设计和播放在线协作游戏来解决。 游戏规则旨在解决具体需求所指定的特定任务。 玩家对游戏的反应产生与特定任务目标相关的大量信息,例如排序,排序和评估一组数字内容项目。 为了强制参与游戏,玩家可以奖励货币价值奖励。 作为游戏说明,详细描述了利用人类背景推理的图像选择游戏(ISG)。 从ISG提取的信息是与图像排序和排序任务相关的关键图像关联的列表。
-
公开(公告)号:US08688417B2
公开(公告)日:2014-04-01
申请号:US13162927
申请日:2011-06-17
申请人: Alex Bocharov , Christopher A. Meek , Bo Thiesson
发明人: Alex Bocharov , Christopher A. Meek , Bo Thiesson
IPC分类号: G06F17/50
CPC分类号: G06F17/18
摘要: In one embodiment, an event impact signature detector may analyze a time series with external events. A data interface 250 may receive a data set 310 representing the time series with external events. A processor 220 may fit the data set 310 into a baseline time series model 330. The processor 220 may iteratively determine each event location 352 for multiple external events 350 affecting the baseline time series model 330. The processor 220 may iteratively solve for each event impact 354 of the multiple external events 350 factoring in interactions between the multiple external events 350.
摘要翻译: 在一个实施例中,事件影响签名检测器可以分析具有外部事件的时间序列。 数据接口250可以接收表示具有外部事件的时间序列的数据集310。 处理器220可以将数据集310拟合到基线时间序列模型330中。处理器220可以迭代地确定影响基线时间序列模型330的多个外部事件350的每个事件位置352.处理器220可以迭代地解决每个事件的影响 多个外部事件350之间的354个因素导致多个外部事件之间的交互350。
-
-
-
-
-
-
-
-
-