SIMILIARITY MEASURES FOR SHORT SEGMENTS OF TEXT
    1.
    发明申请
    SIMILIARITY MEASURES FOR SHORT SEGMENTS OF TEXT 审中-公开
    短篇短文的类似措施

    公开(公告)号:US20090240498A1

    公开(公告)日:2009-09-24

    申请号:US12051183

    申请日:2008-03-19

    IPC分类号: G10L15/08

    CPC分类号: G06F17/2211 G06F16/35

    摘要: Systems and methods to perform short text segment similarity measures. Illustratively, a short text segment similarity environment comprises a short text engine operative to process data representative of short segments of text and an instruction set comprising at least one instruction to instruct the short text engine to process data representative of short text segment inputs according to a selected short text similarity identification paradigm. Illustratively, two or more short text segments can be received as input by the short text engine and a request to identify similarities among the two or more short text segments. Responsive to the request and data input, the short text engine executes a selected similarity identification technique in accordance with the sort text similarity identification paradigm to process the received data and to identify similarities between the short text segment inputs.

    摘要翻译: 执行短文本段相似性度量的系统和方法。 示例性地,短文本段相似性环境包括用于处理代表短段文本的数据的短文本引擎和包括至少一个指令的指令集,以指示短文本引擎根据以下内容来处理代表短文本段输入的数据 选择短文本相似性识别范式。 说明性地,可以接收短文本引擎的两个或多个短文本段作为输入,以及用于标识两个或更多个短文本段之间的相似性的请求。 响应于请求和数据输入,短文本引擎根据排序文本相似性识别范例来执行所选择的相似性识别技术,以处理接收到的数据并识别短文本段输入之间的相似性。

    Learning Discriminative Projections for Text Similarity Measures
    2.
    发明申请
    Learning Discriminative Projections for Text Similarity Measures 审中-公开
    用于文本相似度量度的学习判别预测

    公开(公告)号:US20120323968A1

    公开(公告)日:2012-12-20

    申请号:US13160485

    申请日:2011-06-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/31

    摘要: A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.

    摘要翻译: 公开了将文本对象的原始文本表示映射到向量空间的模型。 定义了一个功能,用于计算给定两个输出向量的相似度得分。 定义了一种损失函数,用于计算基于相似度得分和向量对的标签的误差。 调整模型的参数以最小化损失函数。 两个向量的标签表示对象的相似度。 标签可以是二进制数字或实数值。 用于计算相似性分数的函数可以是余弦,Jaccard或可微分函数。 损失函数可以将向量对与其标签进行比较。 输出向量的每个元素是输入向量的项的线性或非线性函数。 文本对象可以是不同类型的文档,并且可以同时训练两个不同的模型。

    Consistent phrase relevance measures
    3.
    发明授权
    Consistent phrase relevance measures 有权
    一致的短语相关性度量

    公开(公告)号:US08996515B2

    公开(公告)日:2015-03-31

    申请号:US13609257

    申请日:2012-09-11

    IPC分类号: G06F7/00 G06F17/30 G06Q30/02

    CPC分类号: G06F17/30687 G06Q30/02

    摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

    摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近

    CONSISTENT PHRASE RELEVANCE MEASURES
    4.
    发明申请
    CONSISTENT PHRASE RELEVANCE MEASURES 有权
    一致性相关措施

    公开(公告)号:US20120330978A1

    公开(公告)日:2012-12-27

    申请号:US13609257

    申请日:2012-09-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30687 G06Q30/02

    摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

    摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近

    Consistent phrase relevance measures
    5.
    发明授权
    Consistent phrase relevance measures 有权
    一致的短语相关性度量

    公开(公告)号:US08290946B2

    公开(公告)日:2012-10-16

    申请号:US12144647

    申请日:2008-06-24

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30687 G06Q30/02

    摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

    摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近

    Learning Element Weighting for Similarity Measures
    6.
    发明申请
    Learning Element Weighting for Similarity Measures 有权
    学习元素加权相似度量

    公开(公告)号:US20110219012A1

    公开(公告)日:2011-09-08

    申请号:US12715417

    申请日:2010-03-02

    IPC分类号: G06F17/30 G06F15/18

    CPC分类号: G06F15/18 G06F17/30

    摘要: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.

    摘要翻译: 描述了一种用于通过从训练数据(例如标记的对象对)学习术语加权函数的框架来测量两个对象(例如,文档)之间的相似性的技术,以开发学习的模型。 学习过程通过最小化相似性得分的定义的损失函数来调整模型参数。 还描述了使用学习过程和学习模型来检测近似重复的文档。

    CONSISTENT PHRASE RELEVANCE MEASURES
    7.
    发明申请
    CONSISTENT PHRASE RELEVANCE MEASURES 有权
    一致性相关措施

    公开(公告)号:US20090319508A1

    公开(公告)日:2009-12-24

    申请号:US12144647

    申请日:2008-06-24

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F17/30687 G06Q30/02

    摘要: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

    摘要翻译: 描述了两种衡量关键字 - 文档相关性的方法。 方法接收关键字和文档作为输入,并输出关键字的概率值。 第一种方法是基于相似性的方法,其使用用于测量两个短文本段之间的相似性的技术来测量关键字和文档之间的相关性。 第二种方法是基于回归的方法,基于一个假设,如果文档外短语(关键字)在语义上类似于文档内短语,则文本内和外的短语的相关性分数应为 彼此接近

    Learning element weighting for similarity measures
    8.
    发明授权
    Learning element weighting for similarity measures 有权
    相似性度量的学习要素权重

    公开(公告)号:US09183173B2

    公开(公告)日:2015-11-10

    申请号:US12715417

    申请日:2010-03-02

    IPC分类号: G06F15/18 G06F17/30

    CPC分类号: G06F15/18 G06F17/30

    摘要: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.

    摘要翻译: 描述了一种用于通过从训练数据(例如标记的对象对)学习术语加权函数的框架来测量两个对象(例如,文档)之间的相似性的技术,以开发学习的模型。 学习过程通过最小化相似性得分的定义的损失函数来调整模型参数。 还描述了使用学习过程和学习模型来检测近似重复的文档。

    Detecting impact of extrinsic events on a time series
    10.
    发明授权
    Detecting impact of extrinsic events on a time series 有权
    检测外部事件对时间序列的影响

    公开(公告)号:US08688417B2

    公开(公告)日:2014-04-01

    申请号:US13162927

    申请日:2011-06-17

    IPC分类号: G06F17/50

    CPC分类号: G06F17/18

    摘要: In one embodiment, an event impact signature detector may analyze a time series with external events. A data interface 250 may receive a data set 310 representing the time series with external events. A processor 220 may fit the data set 310 into a baseline time series model 330. The processor 220 may iteratively determine each event location 352 for multiple external events 350 affecting the baseline time series model 330. The processor 220 may iteratively solve for each event impact 354 of the multiple external events 350 factoring in interactions between the multiple external events 350.

    摘要翻译: 在一个实施例中,事件影响签名检测器可以分析具有外部事件的时间序列。 数据接口250可以接收表示具有外部事件的时间序列的数据集310。 处理器220可以将数据集310拟合到基线时间序列模型330中。处理器220可以迭代地确定影响基线时间序列模型330的多个外部事件350的每个事件位置352.处理器220可以迭代地解决每个事件的影响 多个外部事件350之间的354个因素导致多个外部事件之间的交互350。