Enhanced query rewriting through click log analysis
    1.
    发明授权
    Enhanced query rewriting through click log analysis 有权
    通过点击日志分析增强查询重写

    公开(公告)号:US09507861B2

    公开(公告)日:2016-11-29

    申请号:US13078553

    申请日:2011-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30672

    摘要: Systems, methods, and computer media for identifying related strings for search query rewriting are provided. Session data for a user search query session in an accessed click log data is identified. It is determined whether a first additional search query in the session data is related to a first user search query based on at least one of: dwell time; a number of search result links clicked on; and similarity between web page titles or uniform resource locators (URLs). When related, the first additional search query is incorporated into a list of strings related to the first user search query. One or more supplemental strings that are related to the first user search query are also identified. The identified supplemental strings are also included in the list of strings related to the first user search query.

    摘要翻译: 提供了用于识别用于搜索查询重写的相关字符串的系统,方法和计算机媒体。 识别访问的点击日志数据中的用户搜索查询会话的会话数据。 基于以下中的至少一个确定会话数据中的第一附加搜索查询是否与第一用户搜索查询相关:驻留时间; 点击了一些搜索结果链接; 以及网页标题或统一资源定位符(URL)之间的相似性。 当相关时,第一附加搜索查询被合并到与第一用户搜索查询相关的字符串列表中。 还识别与第一用户搜索查询相关的一个或多个补充字符串。 所识别的补充字符串也包括在与第一用户搜索查询相关的字符串列表中。

    Generic framework for large-margin MCE training in speech recognition
    2.
    发明授权
    Generic framework for large-margin MCE training in speech recognition 有权
    语言识别中大面积MCE培训的通用框架

    公开(公告)号:US08423364B2

    公开(公告)日:2013-04-16

    申请号:US11708440

    申请日:2007-02-20

    IPC分类号: G10L15/14 G10L15/00 G10L15/06

    CPC分类号: G10L15/063 G10L2015/0631

    摘要: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

    摘要翻译: 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定初始声学模型的每个令牌,分数计算分别为正确的类和竞争类。 此外,针对每个训练令牌计算样本自适应窗口带宽。 从计算出的分数和采样自适应窗口带宽值,根据损失函数计算损失值。 可以从贝叶斯风险最小化观点导出的损失函数可以包括移动判定边界的边距值,使得靠近判定边界的正确令牌的令牌到边界的距离最大化。 边距可以是固定边距,也可以作为算法迭代的函数单调变化。 基于计算的损失值更新声学模型。 可以重复该过程,直到满足经验收敛。

    Minimum classification error training with growth transformation optimization
    3.
    发明授权
    Minimum classification error training with growth transformation optimization 有权
    最小分类误差训练与生长变换优化

    公开(公告)号:US08301449B2

    公开(公告)日:2012-10-30

    申请号:US11581673

    申请日:2006-10-16

    申请人: Xiaodong He Li Deng

    发明人: Xiaodong He Li Deng

    IPC分类号: G10L15/00

    CPC分类号: G10L15/063 G10L15/144

    摘要: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

    摘要翻译: 使用基于最小分类误差目标函数的生长变换优化的更新方程来更新隐马尔可夫模型(HMM)参数。 使用通过使用当前迭代HMM参数对训练数据进行解码而获得的N个最佳竞争者词序列表,迭代地更新当前HMM参数。 更新过程涉及使用可以获得任何正实值的每个竞争者词序列的权重。 更新过程进一步扩展到使用竞争者的解码格子的情况。 在这种情况下,更新模型参数依赖于基于跨越时间点而不是整个单词序列的单词来确定在时间点的状态的概率。 这个字边界的时间范围比整个单词序列的持续时间短,从而减少了计算时间。

    Enhanced Query Rewriting Through Click Log Analysis
    4.
    发明申请
    Enhanced Query Rewriting Through Click Log Analysis 有权
    通过点击日志分析增强查询重写

    公开(公告)号:US20120254217A1

    公开(公告)日:2012-10-04

    申请号:US13078553

    申请日:2011-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30672

    摘要: Systems, methods, and computer media for identifying related strings for search query rewriting are provided. Session data for a user search query session in an accessed click log data is identified. It is determined whether a first additional search query in the session data is related to a first user search query based on at least one of: dwell time; a number of search result links clicked on; and similarity between web page titles or uniform resource locators (URLs). When related, the first additional search query is incorporated into a list of strings related to the first user search query. One or more supplemental strings that are related to the first user search query are also identified. The identified supplemental strings are also included in the list of strings related to the first user search query.

    摘要翻译: 提供了用于识别用于搜索查询重写的相关字符串的系统,方法和计算机媒体。 识别访问的点击日志数据中的用户搜索查询会话的会话数据。 基于以下中的至少一个确定会话数据中的第一附加搜索查询是否与第一用户搜索查询相关:驻留时间; 点击了一些搜索结果链接; 以及网页标题或统一资源定位符(URL)之间的相似性。 当相关时,第一附加搜索查询被合并到与第一用户搜索查询相关的字符串列表中。 还识别与第一用户搜索查询相关的一个或多个补充字符串。 所识别的补充字符串也包括在与第一用户搜索查询相关的字符串列表中。

    SELECTION OF DOMAIN-ADAPTED TRANSLATION SUBCORPORA
    5.
    发明申请
    SELECTION OF DOMAIN-ADAPTED TRANSLATION SUBCORPORA 有权
    选择域适应翻译SUBCORPORA

    公开(公告)号:US20120203539A1

    公开(公告)日:2012-08-09

    申请号:US13022633

    申请日:2011-02-08

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2809

    摘要: Architecture that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.

    摘要翻译: 架构提供了从外域语料库中选择最相关的数据的能力,以隔离或与域内数据组合使用。 该架构是机器翻译的域适应,从较大的平行翻译句子的一般领域语料库中选择最相关的句子。 选择数据的方法包括单语交叉熵测度,单语交叉熵差,双语交叉熵和双语交叉熵差。 对域内数据和外域子集进行翻译模型的训练,并将这些模型插值到一起,以提升域内翻译任务的性能。

    TRAINING PARSERS TO APPROXIMATELY OPTIMIZE NDCG
    6.
    发明申请
    TRAINING PARSERS TO APPROXIMATELY OPTIMIZE NDCG 有权
    训练员大大优化NDCG

    公开(公告)号:US20120150836A1

    公开(公告)日:2012-06-14

    申请号:US12962751

    申请日:2010-12-08

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A supervised technique uses relevance judgments to train a dependency parser such that it approximately optimizes Normalized Discounted Cumulative Gain (NDCG) in information retrieval. A weighted tree edit distance between the parse tree for a query and the parse tree for a document is added to a ranking function, where the edit distance weights are parameters from the parser. Using parser parameters in the ranking function enables approximate optimization of the parser's parameters for NDCG by adding some constraints to the objective function.

    摘要翻译: 监督技术使用相关性判断来训练依赖性解析器,使得它在信息检索中大致优化归一化折扣累积增益(NDCG)。 用于查询的解析树和文档的解析树之间的加权树编辑距离被添加到排序函数,其中编辑距离权重是来自解析器的参数。 在排序函数中使用解析器参数可以通过向目标函数添加一些约束来近似优化NDCG的解析器参数。

    SPEECH MODELS GENERATED USING COMPETITIVE TRAINING, ASYMMETRIC TRAINING, AND DATA BOOSTING
    7.
    发明申请
    SPEECH MODELS GENERATED USING COMPETITIVE TRAINING, ASYMMETRIC TRAINING, AND DATA BOOSTING 有权
    使用竞争性培训,不对称培训和数据提升生成的语音模型

    公开(公告)号:US20100161330A1

    公开(公告)日:2010-06-24

    申请号:US12720968

    申请日:2010-03-10

    申请人: Xiaodong He Jian Wu

    发明人: Xiaodong He Jian Wu

    IPC分类号: G10L15/06 G10L15/00

    CPC分类号: G10L15/063

    摘要: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.

    摘要翻译: 使用三种不同的训练系统中的一种或多种来训练语音模型。 它们包括减少识别结果与真实结果之间的距离的竞争性训练,对训练数据进行分组和加权的数据提升以及不同模型组成部分的不对称训练。

    Wavy-Shaped Electric Straightening Comb
    9.
    发明申请
    Wavy-Shaped Electric Straightening Comb 审中-公开
    波浪形电动矫直梳

    公开(公告)号:US20160360847A1

    公开(公告)日:2016-12-15

    申请号:US15173658

    申请日:2016-06-05

    申请人: XiaoDong He

    发明人: XiaoDong He

    IPC分类号: A45D2/00 A45D24/16

    CPC分类号: A45D2/002 A45D20/48 A45D24/16

    摘要: A wavy-shaped electric straight comb, which includes a comb part and a handle. The comb part has a first comb and a second comb, and the first comb has a plurality of first comb teeth, and the second comb has a plurality of second comb teeth, a plurality of through holes, each formed between two adjacent second comb teeth, which the plurality of first comb teeth of the first comb respectively drills through the plurality of through holes of the second comb, and each first comb teeth is disposed between two corresponding adjacent comb teeth, for assembling the first comb and the second comb together, and each of the first comb teeth and the second comb teeth defines a wavy-shaped cross-section, and two adjacent first and second comb teeth keep an interval from 0.25 mm to 1.5 mm and define a wavy-shaped hair accommodating space.

    摘要翻译: 波形形状的电子直梳,其包括梳子部分和把手。 所述梳部具有第一梳和第二梳,所述第一梳具有多个第一梳齿,所述第二梳具有多个第二梳齿,多个通孔,每个通孔形成在两个相邻的第二梳齿之间 第一梳子的多个第一梳齿分别穿过第二梳子的多个通孔,并且每个第一梳齿布置在两个对应的相邻梳齿之间,用于将第一梳和第二梳组合在一起, 并且第一梳齿和第二梳齿中的每一个限定波形形状的横截面,并且两个相邻的第一和第二梳齿保持0.25mm至1.5mm的间隔并且限定波状形状的头发容纳空间。