RANKER SELECTION FOR STATISTICAL NATURAL LANGUAGE PROCESSING
    1.
    发明申请
    RANKER SELECTION FOR STATISTICAL NATURAL LANGUAGE PROCESSING 有权
    用于统计自然语言处理的排名选择

    公开(公告)号:US20090125501A1

    公开(公告)日:2009-05-14

    申请号:US11938811

    申请日:2007-11-13

    IPC分类号: G06F7/10

    CPC分类号: G06F17/2715

    摘要: Systems and methods for selecting a ranker for statistical natural language processing are provided. One disclosed system includes a computer program configured to be executed on a computing device, the computer program comprising a data store including reference performance data for a plurality of candidate rankers, the reference performance data being calculated based on a processing of test data by each of the plurality of candidate rankers. The system may further include a ranker selector configured to receive a statistical natural language processing task and a performance target, and determine a selected ranker from the plurality of candidate rankers based on the statistical natural language processing task, the performance target, and the reference performance data.

    摘要翻译: 提供了用于选择用于统计自然语言处理的游戏者的系统和方法。 一种公开的系统包括被配置为在计算设备上执行的计算机程序,该计算机程序包括数据存储器,该数据存储器包括用于多个候选排名者的参考演出数据,该参考演出数据是基于每个测试数据的处理来计算的 多个候选排名。 该系统可以进一步包括配置成接收统计自然语言处理任务和性能目标的排队选择器,并且基于统计自然语言处理任务,性能目标和参考性能来确定来自多个候选排名者的选定队员 数据。

    Ranker selection for statistical natural language processing
    2.
    发明授权
    Ranker selection for statistical natural language processing 有权
    统计自然语言处理的Ranker选择

    公开(公告)号:US07844555B2

    公开(公告)日:2010-11-30

    申请号:US11938811

    申请日:2007-11-13

    CPC分类号: G06F17/2715

    摘要: Systems and methods for selecting a ranker for statistical natural language processing are provided. One disclosed system includes a computer program configured to be executed on a computing device, the computer program comprising a data store including reference performance data for a plurality of candidate rankers, the reference performance data being calculated based on a processing of test data by each of the plurality of candidate rankers. The system may further include a ranker selector configured to receive a statistical natural language processing task and a performance target, and determine a selected ranker from the plurality of candidate rankers based on the statistical natural language processing task, the performance target, and the reference performance data.

    摘要翻译: 提供了用于选择用于统计自然语言处理的游戏者的系统和方法。 一种公开的系统包括被配置为在计算设备上执行的计算机程序,该计算机程序包括数据存储器,该数据存储器包括用于多个候选排名者的参考演出数据,该参考演出数据是基于每个测试数据的处理来计算的 多个候选排名。 该系统可以进一步包括配置成接收统计自然语言处理任务和性能目标的排队选择器,并且基于统计自然语言处理任务,性能目标和参考性能来确定来自多个候选排名者的选定队员 数据。

    Clickthrough-based latent semantic model
    3.
    发明授权
    Clickthrough-based latent semantic model 有权
    基于点击的潜在语义模型

    公开(公告)号:US09009148B2

    公开(公告)日:2015-04-14

    申请号:US13329345

    申请日:2011-12-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.

    摘要翻译: 提供了用于对文档进行排序的计算机实现的方法和系统。 该方法包括基于多个文档的点击数据来识别多个查询文档对。 该方法还包括基于查询文档对构建潜在语义模型,并根据潜在语义模型对搜索文档进行排序。

    CLICKTHROUGH-BASED LATENT SEMANTIC MODEL
    4.
    发明申请
    CLICKTHROUGH-BASED LATENT SEMANTIC MODEL 有权
    基于CLICKTHROUGH的LATENT语义模型

    公开(公告)号:US20130159320A1

    公开(公告)日:2013-06-20

    申请号:US13329345

    申请日:2011-12-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.

    摘要翻译: 提供了用于对文档进行排序的计算机实现的方法和系统。 该方法包括基于多个文档的点击数据来识别多个查询文档对。 该方法还包括基于查询文档对构建潜在语义模型,并根据潜在语义模型对搜索文档进行排序。

    Limited-memory quasi-newton optimization algorithm for L1-regularized objectives
    5.
    发明授权
    Limited-memory quasi-newton optimization algorithm for L1-regularized objectives 有权
    L1规范化目标的有限存储准牛顿优化算法

    公开(公告)号:US07933847B2

    公开(公告)日:2011-04-26

    申请号:US11874199

    申请日:2007-10-17

    CPC分类号: G06N99/005

    摘要: An algorithm that employs modified methods developed for optimizing differential functions but which can also handle the special non-differentiabilities that occur with the L1-regularization. The algorithm is a modification of the L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton algorithm, but which can now handle the discontinuity of the gradient using a procedure that chooses a search direction at each iteration and modifies the line search procedure. The algorithm includes an iterative optimization procedure where each iteration approximately minimizes the objective over a constrained region of the space on which the objective is differentiable (in the case of L1-regularization, a given orthant), models the second-order behavior of the objective by considering the loss component alone, using a “line-search” at each iteration that projects search points back onto the chosen orthant, and determines when to stop the line search.

    摘要翻译: 一种使用为优化差分功能而开发的修改方法的算法,但也可以处理L1正则化发生的特殊非差异性。 该算法是L-BFGS(有限存储器Broyden-Fletcher-Goldfarb-Shanno)准牛顿算法的修改,但现在可以使用在每次迭代中选择搜索方向的过程来处理梯度的不连续性,并且修改 线搜索程序。 该算法包括一个迭代优化过程,其中每次迭代大致使目标在目标可微分的空间的约束区域(在L1正则化的情况下,给定的不对称)下的目标最小化,对目标的二阶行为进行建模 通过考虑单独的损失组件,在每次迭代时使用“线搜索”来将搜​​索点投射回所选择的不同,并确定何时停止线搜索。

    LIMITED-MEMORY QUASI-NEWTON OPTIMIZATION ALGORITHM FOR L1-REGULARIZED OBJECTIVES
    6.
    发明申请
    LIMITED-MEMORY QUASI-NEWTON OPTIMIZATION ALGORITHM FOR L1-REGULARIZED OBJECTIVES 有权
    用于L1规范化目标的有限存储器QUASI-NEWTON优化算法

    公开(公告)号:US20090106173A1

    公开(公告)日:2009-04-23

    申请号:US11874199

    申请日:2007-10-17

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: An algorithm that employs modified methods developed for optimizing differential functions but which can also handle the special non-differentiabilities that occur with the L1-regularization. The algorithm is a modification of the L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton algorithm, but which can now handle the discontinuity of the gradient using a procedure that chooses a search direction at each iteration and modifies the line search procedure. The algorithm includes an iterative optimization procedure where each iteration approximately minimizes the objective over a constrained region of the space on which the objective is differentiable (in the case of L1-regularization, a given orthant), models the second-order behavior of the objective by considering the loss component alone, using a “line-search” at each iteration that projects search points back onto the chosen orthant, and determines when to stop the line search.

    摘要翻译: 一种使用为优化差分功能而开发的修改方法的算法,但也可以处理L1正则化发生的特殊非差异性。 该算法是L-BFGS(有限存储器Broyden-Fletcher-Goldfarb-Shanno)准牛顿算法的修改,但现在可以使用在每次迭代中选择搜索方向的过程来处理梯度的不连续性,并且修改 线搜索程序。 该算法包括一个迭代优化过程,其中每次迭代大致使目标在目标可微分的空间的约束区域(在L1正则化的情况下,给定的不对称)下的目标最小化,对目标的二阶行为进行建模 通过考虑单独的损失组件,在每次迭代时使用“线搜索”来将搜​​索点投射回所选择的不同,并确定何时停止线搜索。

    Structured cross-lingual relevance feedback for enhancing search results
    7.
    发明授权
    Structured cross-lingual relevance feedback for enhancing search results 有权
    结构化的跨语言相关性反馈,以增强搜索结果

    公开(公告)号:US08645289B2

    公开(公告)日:2014-02-04

    申请号:US12970879

    申请日:2010-12-16

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30669 G06F17/30675

    摘要: A “Cross-Lingual Unified Relevance Model” provides a feedback model that improves a machine-learned ranker for a language with few training resources, using feedback from a more complete ranker for a language that has more training resources. The model focuses on linguistically non-local queries, such as “world cup” (English language/U.S. market) and “copa mundial” (Spanish language/Mexican market), that have similar user intent in different languages and markets or regions, thus allowing the low-resource ranker to receive direct relevance feedback from the high-resource ranker. Among other things, the Cross-Lingual Unified Relevance Model differs from conventional relevancy-based techniques by incorporating both query- and document-level features. More specifically, the Cross-Lingual Unified Relevance Model generalizes existing cross-lingual feedback models, incorporating both query expansion and document re-ranking to further amplify the signal from the high-resource ranker to enable a learning to rank approach based on appropriately labeled training data.

    摘要翻译: “跨语言统一相关性模型”提供了一种反馈模型,可以为少数培训资源的语言改进机器学习游戏者,使用更完整的游戏者的反馈来获得具有更多培训资源的语言。 该模式侧重于语言上的非本地查询,例如“世界杯”(英语/美国市场)和“复合世界”(西班牙语/墨西哥市场),在不同语言和市场或区域具有类似的用户意图,因此 允许低资源游击队员接收来自高资源队员的直接相关反馈。 其中,跨语言统一相关性模型与传统的相关性技术不同,包括查询和文档级功能。 更具体地说,跨语言统一相关性模型概括了现有的跨语言反馈模型,其中包括查询扩展和文档重新排序,以进一步放大来自高资源游戏者的信号,以使学习能够基于适当标记的训练进行排名 数据。

    Enhanced Query Rewriting Through Statistical Machine Translation
    8.
    发明申请
    Enhanced Query Rewriting Through Statistical Machine Translation 有权
    通过统计机器翻译增强查询重写

    公开(公告)号:US20120254218A1

    公开(公告)日:2012-10-04

    申请号:US13078648

    申请日:2011-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: Systems, methods, and computer media for identifying query rewriting replacement terms are provided. A list of related string pairs each comprising a first string and second string is received. The first string of each related string pair is a user search query extracted from user click log data. For one or more of the related string pairs, the string pair is provided as inputs to a statistical machine translation model. The model identifies one or more pairs of corresponding terms, each pair of corresponding terms including a first term from the first string and a second term from the second string. The model also calculates a probability of relatedness for each of the one or more pairs of corresponding terms. Term pairs whose calculated probability of relatedness exceeds a threshold are characterized as query term replacements and incorporated, along with the probability of relatedness, into a query rewriting candidate database.

    摘要翻译: 提供了用于识别查询重写替换术语的系统,方法和计算机媒体。 接收包括第一串和第二串的相关字符串对的列表。 每个相关字符串对的第一个字符串是从用户点击日志数据中提取的用户搜索查询。 对于一个或多个相关字符串对,字符串对作为统计机器翻译模型的输入提供。 该模型识别一对或多对对应的术语,每对对应的术语包括来自第一个字符串的第一项和来自第二个字符串的第二个项。 该模型还计算一对或多对相应项中的每一对的相关概率。 其相关性概率超过阈值的术语对被表征为查询词替换,并将其与相关性的概率一起并入查询重写候选数据库中。

    DEPENDENCY-BASED QUERY EXPANSION ALTERATION CANDIDATE SCORING
    9.
    发明申请
    DEPENDENCY-BASED QUERY EXPANSION ALTERATION CANDIDATE SCORING 有权
    基于依赖性的查询扩展替换候选评分

    公开(公告)号:US20120131031A1

    公开(公告)日:2012-05-24

    申请号:US12951068

    申请日:2010-11-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30967 G06F17/30672

    摘要: An alteration candidate for a query can be scored. The scoring may include computing one or more query-dependent feature scores and/or one or more intra-candidate dependent feature scores. The computation of the query-dependent feature score(s) can be based on dependencies to multiple query terms from each of one or more alteration terms (i.e., for each of the one or more alteration terms, there can be dependencies to multiple query terms that form at least a portion of the basis for the query-dependent feature score(s)). The computation of the intra-candidate dependent feature score(s) can be based on dependencies between different terms in the alteration candidate. A candidate score can be computed using the query dependent feature score(s) and/or the intra-candidate dependent feature score(s). Additionally, the candidate score can be used in determining whether to select the candidate to expand the query. If selected, the candidate can be used to expand the query.

    摘要翻译: 可以对查询的变更候选进行评分。 评分可以包括计算一个或多个依赖于查询的特征得分和/或一个或多个候选内相关特征得分。 依赖于查询的特征得分的计算可以基于来自一个或多个改变项中的每一个的多个查询词的依赖性(即,对于一个或多个改变术语中的每一个,可以依赖于多个查询术语 其形成用于查询相关特征得分的基础的至少一部分)。 候选者相关特征得分的计算可以基于变更候选者中不同术语之间的依赖关系。 可以使用查询相关特征得分和/或候选内相关特征得分来计算候选分数。 此外,可以使用候选分数来确定是否选择候选来扩展查询。 如果选择,候选人可以用来扩展查询。

    HMM alignment for combining translation systems
    10.
    发明授权
    HMM alignment for combining translation systems 有权
    用于组合翻译系统的HMM对齐

    公开(公告)号:US08060358B2

    公开(公告)日:2011-11-15

    申请号:US12147807

    申请日:2008-06-27

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827 G06F17/2818

    摘要: A computing system configured to produce an optimized translation hypothesis of text input into the computing system. The computing system includes a plurality of translation machines. Each of the translation machines is configured to produce their own translation hypothesis from the same text. An optimization machine is connected to the plurality of translation machines. The optimization machine is configured to receive the translation hypotheses from the translation machines. The optimization machine is further configured to align, word-to-word, the hypotheses in the plurality of hypotheses by using a hidden Markov model.

    摘要翻译: 一种计算系统,被配置为产生文本输入到所述计算系统中的优化翻译假说。 计算系统包括多个翻译机。 每个翻译机被配置为从相同的文本产生他们自己的翻译假设。 优化机连接到多台翻译机。 优化机被配置为从翻译机接收翻译假说。 优化机还被配置为通过使用隐马尔科夫模型来对齐单词到多个假设中的假设。