ERROR PREDICTION WITH PARTIAL FEEDBACK
    1.
    发明申请
    ERROR PREDICTION WITH PARTIAL FEEDBACK 有权
    错误预测与部分反馈

    公开(公告)号:US20150019912A1

    公开(公告)日:2015-01-15

    申请号:US13937271

    申请日:2013-07-09

    CPC classification number: G06F11/2257 G06F11/079 G06F11/2236 G06F11/3684

    Abstract: A method for performing data processing through a pipeline of components includes receiving a set of training observations, each including partial user feedback relating to error in data output by the pipeline for respective input data. Some pipeline components commit errors for at least some of the input data, contributing to an error in the respective output data. A prediction model models a probability of a pipeline component committing an error, given input data. Model parameters are learned using the training observations. For a new observation which includes input data and, optionally, partial user feedback indicating that an error has occurred in processing the new input data, without specifying which pipeline component(s) contributed to the observed error in the output data, a prediction is made as to which of the pipeline components contributed to the error in the output (if any).

    Abstract translation: 一种通过组件流水线执行数据处理的方法包括:接收一组训练观测值,每组训练观测值包括与用于相应输入数据的流水线输出的数据中的错误有关的部分用户反馈。 一些流水线组件针对至少一些输入数据提交错误,导致相应输出数据中的错误。 给定输入数据,预测模型建立了一个管道组件提交错误的概率。 使用训练观察学习模型参数。 对于包括输入数据和可选地,指示在处理新的输入数据时已经发生错误的部分用户反馈的新观察,而不指定对输出数据中观察到的误差有贡献的流水线分量,进行预测 哪些管道组件导致输出中的错误(如果有的话)。

    LANGUAGE MODEL WITH STRUCTURED PENALTY
    2.
    发明申请
    LANGUAGE MODEL WITH STRUCTURED PENALTY 有权
    具有结构性罚款的语言模型

    公开(公告)号:US20160070697A1

    公开(公告)日:2016-03-10

    申请号:US14482035

    申请日:2014-09-10

    CPC classification number: G06F17/28 G06F17/2775 G10L15/197

    Abstract: A penalized loss is optimized using a corpus of language samples respective to a set of parameters of a language model. The penalized loss includes a function measuring predictive accuracy of the language model respective to the corpus of language samples and a penalty comprising a tree-structured norm. The trained language model with optimized values for the parameters generated by the optimizing is applied to predict a symbol following sequence of symbols of the language modeled by the language model. In some embodiments the penalty comprises a tree-structured lp-norm, such as a tree-structured l2-norm or a tree-structured l∞-norm. In some embodiments a tree-structured l∞-norm operates on a collapsed suffix trie in which any series of suffixes of increasing lengths which are always observed in the same context are collapsed into a single node. The optimizing may be performed using a proximal step algorithm.

    Abstract translation: 使用与语言模型的一组参数相对应的语言样本语料库来优化惩罚性损失。 惩罚性损失包括测量与语言样本语料库相对应的语言模型的预测准确度的函数和包括树结构规范的惩罚。 应用经过优化生成的参数优化值的经过训练的语言模型,以预测由语言模型建模的语言符号序列。 在一些实施例中,惩罚包括树结构的lp范数,例如树结构的l2范数或树结构的l∞范数。 在一些实施例中,树结构的l∞范数在折叠的后缀特里进行操作,其中在相同上下文中始终观察到的任何一系列增长长度的后缀被折叠成单个节点。 可以使用近端步骤算法来执行优化。

    Language model with structured penalty

    公开(公告)号:US09684650B2

    公开(公告)日:2017-06-20

    申请号:US14482035

    申请日:2014-09-10

    CPC classification number: G06F17/28 G06F17/2775 G10L15/197

    Abstract: A penalized loss is optimized using a corpus of language samples respective to a set of parameters of a language model. The penalized loss includes a function measuring predictive accuracy of the language model respective to the corpus of language samples and a penalty comprising a tree-structured norm. The trained language model with optimized values for the parameters generated by the optimizing is applied to predict a symbol following sequence of symbols of the language modeled by the language model. In some embodiments the penalty comprises a tree-structured lp-norm, such as a tree-structured l2-norm or a tree-structured l∞-norm. In some embodiments a tree-structured l∞-norm operates on a collapsed suffix trie in which any series of suffixes of increasing lengths which are always observed in the same context are collapsed into a single node. The optimizing may be performed using a proximal step algorithm.

    Error prediction with partial feedback
    4.
    发明授权
    Error prediction with partial feedback 有权
    部分反馈误差预测

    公开(公告)号:US09069736B2

    公开(公告)日:2015-06-30

    申请号:US13937271

    申请日:2013-07-09

    CPC classification number: G06F11/2257 G06F11/079 G06F11/2236 G06F11/3684

    Abstract: A method for performing data processing through a pipeline of components includes receiving a set of training observations, each including partial user feedback relating to error in data output by the pipeline for respective input data. Some pipeline components commit errors for at least some of the input data, contributing to an error in the respective output data. A prediction model models a probability of a pipeline component committing an error, given input data. Model parameters are learned using the training observations. For a new observation which includes input data and, optionally, partial user feedback indicating that an error has occurred in processing the new input data, without specifying which pipeline component(s) contributed to the observed error in the output data, a prediction is made as to which of the pipeline components contributed to the error in the output (if any).

    Abstract translation: 一种通过组件流水线执行数据处理的方法包括:接收一组训练观测值,每组训练观测值包括与用于相应输入数据的流水线输出的数据中的错误有关的部分用户反馈。 一些流水线组件针对至少一些输入数据提交错误,导致相应输出数据中的错误。 给定输入数据,预测模型建立了一个管道组件提交错误的概率。 使用训练观察学习模型参数。 对于包括输入数据和可选地,指示在处理新的输入数据时已经发生错误的部分用户反馈的新观察,而不指定对输出数据中观察到的误差有贡献的流水线分量,进行预测 哪些管道组件导致输出中的错误(如果有的话)。

    PROBABILISTIC RELATIONAL DATA ANALYSIS
    5.
    发明申请
    PROBABILISTIC RELATIONAL DATA ANALYSIS 审中-公开
    概率关系数据分析

    公开(公告)号:US20140156231A1

    公开(公告)日:2014-06-05

    申请号:US13690071

    申请日:2012-11-30

    CPC classification number: G06F17/18 G06N7/005

    Abstract: A multi-relational data set is represented by a probabilistic multi-relational data model in which each entity of the multi-relational data set is represented by a D-dimensional latent feature vector. The probabilistic multi-relational data model is trained using a collection of observations of relations between entities of the multi-relational data set. The collection of observations includes observations of at least two different relation types. A prediction is generated for an observation of a relation between two or more entities of the multi-relational data set based on a dot product of the optimized D-dimensional latent feature vectors representing the two or more entities. The training may comprise optimizing the D-dimensional latent feature vectors to maximize likelihood of the collection of observations, for example by Bayesian inference performed using Gibbs sampling.

    Abstract translation: 多关系数据集由概率多关系数据模型表示,其中多关系数据集的每个实体由D维潜在特征向量表示。 概率多关系数据模型使用多关系数据集的实体之间的关系的观察集来训练。 观察的收集包括至少两种不同关系类型的观察。 生成用于基于代表两个或多个实体的优化的D维潜在特征向量的点积来观察多关系数据集的两个或多个实体之间的关系的预测。 该训练可以包括优化D维潜在特征向量以最大化观察的收集的可能性,例如通过使用吉布斯抽样执行的贝叶斯推理。

Patent Agency Ranking