QUERY CORRECTION PROBABILITY BASED ON QUERY-CORRECTION PAIRS
    41.
    发明申请
    QUERY CORRECTION PROBABILITY BASED ON QUERY-CORRECTION PAIRS 审中-公开
    基于查询对的查询校正概率

    公开(公告)号:US20110295897A1

    公开(公告)日:2011-12-01

    申请号:US12790996

    申请日:2010-06-01

    IPC分类号: G06F17/30

    CPC分类号: G06F16/3322 G06F16/951

    摘要: Query-correction pairs can be extracted from search log data. Each query-correction pair can include an original query and a follow-up query, where the follow-up query meets one or more criteria for being identified as a correction of the original query, such as an indication of user input indicating the follow-up query is a correction for the original query. The query-correction pairs can be segmented to identify bi-phrases in the query-correction pairs. Probabilities of corrections between the bi-phrases can be estimated based on frequencies of matches in the query-correction pairs. Identifications of the bi-phrases and representations of the probabilities of those bi-phrases can be stored in a probabilistic model data structure.

    摘要翻译: 可以从搜索日志数据中提取查询校正对。 每个查询 - 校正对可以包括原始查询和后续查询,其中后续查询符合用于被标识为原始查询的校正的一个或多个标准,诸如指示后续查询的用户输入的指示, up查询是对原始查询的更正。 可以对查询校正对进行分段以识别查询校正对中的双语短语。 可以基于查询校正对中的匹配频率来估计双词组之间的校正概率。 双语短语的识别和双语短语概率的表示可以存储在概率模型数据结构中。

    Mining bilingual dictionaries from monolingual web pages
    42.
    发明授权
    Mining bilingual dictionaries from monolingual web pages 有权
    从单语网页挖掘双语词典

    公开(公告)号:US07983903B2

    公开(公告)日:2011-07-19

    申请号:US11851402

    申请日:2007-09-07

    申请人: Jianfeng Gao

    发明人: Jianfeng Gao

    IPC分类号: G06F17/21 G06F17/28 G06F17/20

    CPC分类号: G06F17/2827

    摘要: Systems and methods for identifying translation pairs from web pages are provided. One disclosed method includes receiving monolingual web page data of a source language, and processing the web page data by detecting the occurrence of a predefined pattern in the web page data, and extracting a plurality of translation pair candidates. Each of the translation pair candidates may include a source language string and target language string. The method may further include determining whether each translation pair candidate is a valid transliteration. The method may also include, for each translation pair that is determined not to be a valid transliteration, determining whether each translation pair candidate is a valid translation. The method may further include adding each translation pair that is determined to be a valid translation or transliteration to a dictionary.

    摘要翻译: 提供了用于从网页识别翻译对的系统和方法。 一种公开的方法包括接收源语言的单语网页数据,以及通过检测网页数据中的预定义模式的出现以及提取多个翻译对候选来处理网页数据。 每个翻译对候选者可以包括源语言字符串和目标语言字符串。 该方法还可以包括确定每个翻译对候选者是否是有效的音译。 该方法还可以包括,对于被确定为不是有效音译的每个翻译对,确定每个翻译对候选者是否是有效的翻译。 该方法还可以包括将确定为有效的翻译或音译的每个翻译对添加到字典中。

    SMOOTHING CLICKTHROUGH DATA FOR WEB SEARCH RANKING
    43.
    发明申请
    SMOOTHING CLICKTHROUGH DATA FOR WEB SEARCH RANKING 审中-公开
    用于网络搜索排名的平滑点击数据

    公开(公告)号:US20100318531A1

    公开(公告)日:2010-12-16

    申请号:US12481593

    申请日:2009-06-10

    IPC分类号: G06F17/30 G06F15/18

    摘要: Described is a technology for using clickthrough data (e.g., based on data of a query log) in learning a ranking model that may be used in online ranking of search results. Clickthrough data, which is typically sparse (because many documents are often not clicked or rarely clicked), is processed/smoothed into smoothed clickthrough streams. The processing includes determining similar queries for a document with incomplete (insufficient) clickthrough data to provide expanded clickthrough data for that document, and/or by estimating at least one clickthrough feature for a document when that document has missing (e.g., no) clickthrough data. Similar queries may be determined by random walk clustering and/or session-based query analysis. Features extracted from the clickthrough streams may be used to provide a ranking model which may then be used in online ranking of documents that are located with respect to a query.

    摘要翻译: 描述了一种用于在学习可用于搜索结果的在线排名中的排名模型的点击数据(例如,基于查询日志的数据)的技术。 点击数据通常是稀疏的(因为许多文档经常没有点击或很少点击)被处理/平滑到平滑的点击流中。 该处理包括确定具有不完整(不足够的)点击数据的文档的类似查询,以便为该文档提供扩展的点击数据,和/或通过在该文档缺少(例如,否))点击数据时估计文档的至少一个点击特征 。 可以通过随机游走聚类和/或基于会话的查询分析来确定类似的查询。 从点击流中提取的特征可以用于提供排序模型,然后可以在相对于查询定位的文档的在线排名中使用排名模型。

    Query speller
    44.
    发明授权
    Query speller 有权
    查询拼写器

    公开(公告)号:US07818332B2

    公开(公告)日:2010-10-19

    申请号:US11465023

    申请日:2006-08-16

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3064

    摘要: Candidate suggestions for correcting misspelled query terms input into a search application are automatically generated. A score for each candidate suggestion can be generated using a first decoding pass and paths through the suggestions can be ranked in a second decoding pass. Candidate suggestions can be generated based on typographical errors, phonetic mistakes and/or compounding mistakes. Furthermore, a ranking model can be developed to rank candidate suggestions to be presented to a user.

    摘要翻译: 自动生成用于纠正输入到搜索应用程序中的拼错查询条件的候选建议。 可以使用第一解码通道来生成每个候选建议的得分,并且通过建议的路径可以被排列在第二解码通行证中。 可以根据印刷错误,语音错误和/或复合错误生成候选建议。 此外,可以开发排名模型来排列要呈现给用户的候选建议。

    TRAINING A SEARCH RESULT RANKER WITH AUTOMATICALLY-GENERATED SAMPLES
    47.
    发明申请
    TRAINING A SEARCH RESULT RANKER WITH AUTOMATICALLY-GENERATED SAMPLES 有权
    用自动生成样本培养搜索结果排名

    公开(公告)号:US20100082510A1

    公开(公告)日:2010-04-01

    申请号:US12243359

    申请日:2008-10-01

    IPC分类号: G06F15/18 G06F7/06 G06F17/30

    CPC分类号: G06N99/005 G06F17/3053

    摘要: A search result ranker may be trained with automatically-generated samples. In an example embodiment, user interests are inferred from user interactions with search results for a particular query so as to determine respective relevance scores associated with respective query-identifier pairs of the search results. Query-identifier-relevance score triplets are formulated from the respective relevance scores associated with the respective query-identifier pairs. The query-identifier-relevance score triplets are submitted as training samples to a search result ranker. The search result ranker is trained as a learning machine with multiple training samples of the query-identifier-relevance score triplets.

    摘要翻译: 搜索结果筛选器可以用自动生成的样本进行训练。 在一个示例性实施例中,用户兴趣从用户与特定查询的搜索结果的交互推断,以便确定与搜索结果的相应查询 - 标识符对相关联的相应关联度得分。 查询标识符 - 相关性分数三元组由与相应查询 - 标识符对相关联的各个相关性得分制定。 查询标识符 - 相关性分数三元组作为训练样本提交给搜索结果筛选器。 搜索结果筛选器被训练为具有查询标识符相关性分数三元组的多个训练样本的学习机器。

    RANKER SELECTION FOR STATISTICAL NATURAL LANGUAGE PROCESSING
    48.
    发明申请
    RANKER SELECTION FOR STATISTICAL NATURAL LANGUAGE PROCESSING 有权
    用于统计自然语言处理的排名选择

    公开(公告)号:US20090125501A1

    公开(公告)日:2009-05-14

    申请号:US11938811

    申请日:2007-11-13

    IPC分类号: G06F7/10

    CPC分类号: G06F17/2715

    摘要: Systems and methods for selecting a ranker for statistical natural language processing are provided. One disclosed system includes a computer program configured to be executed on a computing device, the computer program comprising a data store including reference performance data for a plurality of candidate rankers, the reference performance data being calculated based on a processing of test data by each of the plurality of candidate rankers. The system may further include a ranker selector configured to receive a statistical natural language processing task and a performance target, and determine a selected ranker from the plurality of candidate rankers based on the statistical natural language processing task, the performance target, and the reference performance data.

    摘要翻译: 提供了用于选择用于统计自然语言处理的游戏者的系统和方法。 一种公开的系统包括被配置为在计算设备上执行的计算机程序,该计算机程序包括数据存储器,该数据存储器包括用于多个候选排名者的参考演出数据,该参考演出数据是基于每个测试数据的处理来计算的 多个候选排名。 该系统可以进一步包括配置成接收统计自然语言处理任务和性能目标的排队选择器,并且基于统计自然语言处理任务,性能目标和参考性能来确定来自多个候选排名者的选定队员 数据。

    LIMITED-MEMORY QUASI-NEWTON OPTIMIZATION ALGORITHM FOR L1-REGULARIZED OBJECTIVES
    49.
    发明申请
    LIMITED-MEMORY QUASI-NEWTON OPTIMIZATION ALGORITHM FOR L1-REGULARIZED OBJECTIVES 有权
    用于L1规范化目标的有限存储器QUASI-NEWTON优化算法

    公开(公告)号:US20090106173A1

    公开(公告)日:2009-04-23

    申请号:US11874199

    申请日:2007-10-17

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: An algorithm that employs modified methods developed for optimizing differential functions but which can also handle the special non-differentiabilities that occur with the L1-regularization. The algorithm is a modification of the L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton algorithm, but which can now handle the discontinuity of the gradient using a procedure that chooses a search direction at each iteration and modifies the line search procedure. The algorithm includes an iterative optimization procedure where each iteration approximately minimizes the objective over a constrained region of the space on which the objective is differentiable (in the case of L1-regularization, a given orthant), models the second-order behavior of the objective by considering the loss component alone, using a “line-search” at each iteration that projects search points back onto the chosen orthant, and determines when to stop the line search.

    摘要翻译: 一种使用为优化差分功能而开发的修改方法的算法,但也可以处理L1正则化发生的特殊非差异性。 该算法是L-BFGS(有限存储器Broyden-Fletcher-Goldfarb-Shanno)准牛顿算法的修改,但现在可以使用在每次迭代中选择搜索方向的过程来处理梯度的不连续性,并且修改 线搜索程序。 该算法包括一个迭代优化过程,其中每次迭代大致使目标在目标可微分的空间的约束区域(在L1正则化的情况下,给定的不对称)下的目标最小化,对目标的二阶行为进行建模 通过考虑单独的损失组件,在每次迭代时使用“线搜索”来将搜​​索点投射回所选择的不同,并确定何时停止线搜索。

    Finite-state model for processing web queries
    50.
    发明申请
    Finite-state model for processing web queries 失效
    用于处理Web查询的有限状态模型

    公开(公告)号:US20080183673A1

    公开(公告)日:2008-07-31

    申请号:US11698011

    申请日:2007-01-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method of creating an index of web queries is discussed. The method includes receiving a first query representative of one or more symbolic characters and assigning the first query to a first data structure. A first text string representative of the first query is created and assigned to a second data structure. The first and second data structures are stored on a tangible computer readable medium.

    摘要翻译: 讨论了创建Web查询索引的方法。 该方法包括接收表示一个或多个符号字符的第一查询,并将第一查询分配给第一数据结构。 创建表示第一查询的第一文本串并将其分配给第二数据结构。 第一和第二数据结构存储在有形的计算机可读介质上。