BOOTSTRAPPING NAMED ENTITY CANONICALIZERS FROM ENGLISH USING ALIGNMENT MODELS
    11.
    发明申请
    BOOTSTRAPPING NAMED ENTITY CANONICALIZERS FROM ENGLISH USING ALIGNMENT MODELS 有权
    使用对齐模型从英文引用名词实体

    公开(公告)号:US20140200876A1

    公开(公告)日:2014-07-17

    申请号:US13830969

    申请日:2013-03-14

    Applicant: Google Inc.

    CPC classification number: G06F17/289 G06F17/278 G06F17/28

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练对应于第二自然语言中的命名实体短语的识别规范表示,其基于从第一自然语言的规范表示转换一组可允许表达, 这可以通过扩展第一自然语言的允许表达式的上下文无关语法来产生。

    Determining advertisements based on verbal inputs to applications on a computing device
    12.
    发明授权
    Determining advertisements based on verbal inputs to applications on a computing device 有权
    基于对计算设备上的应用程序的口头输入确定广告

    公开(公告)号:US08612226B1

    公开(公告)日:2013-12-17

    申请号:US13751700

    申请日:2013-01-28

    Applicant: Google Inc.

    CPC classification number: G06Q30/0241 G10L15/22

    Abstract: The present disclosure provides methods operable by computing device having one or more applications configured to perform functions based on a received verbal input. The method may comprise receiving a verbal input, obtaining one or more textual phrases corresponding to the received verbal input, and providing the one or more textual phrases to an appropriate application on the computing device. The method may further comprise accumulating data on the one or more textual phrases. The data comprises at least a count of a number of times a particular textual phrase is obtained based on a given received verbal input. Based on the count exceeding a threshold, the method may further comprise providing a query corresponding to the textual phrase, where the query is usable to search an advertisement database for one or more advertisements relating to the textual phrase.

    Abstract translation: 本公开提供了通过计算设备可操作的方法,所述计算设备具有被配置为基于接收的口头输入来执行功能的一个或多个应用。 该方法可以包括接收口头输入,获得与接收到的口头输入相对应的一个或多个文本短语,以及将一个或多个文本短语提供给计算设备上的适当应用。 该方法还可以包括在一个或多个文本短语上累积数据。 所述数据至少包括基于给定的接收到的语言输入获得特定文本短语的次数的计数。 基于超过阈值的计数,该方法还可以包括提供与文本短语相对应的查询,其中查询可用于搜索广告数据库中与文本短语相关的一个或多个广告。

    Clustering Classes in Language Modeling
    13.
    发明申请
    Clustering Classes in Language Modeling 有权
    语言建模中的聚类

    公开(公告)号:US20160062985A1

    公开(公告)日:2016-03-03

    申请号:US14656027

    申请日:2015-03-12

    Applicant: Google Inc.

    CPC classification number: G06F17/30707 G06F17/2715 G06F17/2775

    Abstract: This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.

    Abstract translation: 本文档尤其描述了计算机实现的方法。 该方法可以包括获得多个文本样本,每个文本样本包括属于第一类术语的一个或多个术语。 多个文本样本可以分为多组文本样本。 每组文本样本可以对应于不同的子类的术语。 对于每组文本样本,可以基于相应文本样本组中的文本样本生成子类上下文模型。 被确定为相似的子类上下文模型中的特定的上下文模型可以被合并以生成上下文模型的分层集合。 此外,该方法可以包括选择上下文模型中的特定模型,并且基于所选择的上下文模型生成基于类的语言模型。

    Generating Language Models
    14.
    发明申请
    Generating Language Models 有权
    生成语言模型

    公开(公告)号:US20150348541A1

    公开(公告)日:2015-12-03

    申请号:US14290090

    申请日:2014-05-29

    Applicant: Google Inc.

    CPC classification number: G10L15/183 G06F8/10 G10L15/063 G10L15/1815

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating language models. In some implementations, data is accessed that indicates a set of classes corresponding to a concept. A first language model is generated in which a first class represents the concept. A second language model is generated in which second classes represent the concept. Output of the first language model and the second language model is obtained, and the outputs are evaluated. A class from the set of classes is selected based on evaluating the output of the first language model and the output of the second language model. In some implementations, the first class and the second class are selected from a parse tree or other data that indicates relationships among the classes in the set of classes.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于生成语言模型。 在一些实现中,访问指示与概念相对应的一组类的数据。 生成第一语言模型,其中第一类表示概念。 生成第二种语言模型,其中第二类表示概念。 获得第一语言模型和第二语言模型的输出,并对输出进行评估。 基于评估第一语言模型的输出和第二语言模型的输出来选择来自该组类的类。 在一些实现中,从解析树或指示该组类中的类之间的关系的其他数据中选择第一类和第二类。

    Classification of Offensive Words
    15.
    发明申请
    Classification of Offensive Words 审中-公开
    进攻词分类

    公开(公告)号:US20150309987A1

    公开(公告)日:2015-10-29

    申请号:US14264617

    申请日:2014-04-29

    Applicant: Google Inc.

    CPC classification number: G06F17/2765

    Abstract: A computer-implemented method can include identifying a first set of text samples that include a particular potentially offensive term. Labels can be obtained for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner. A classifier can be trained based at least on the first set of text samples and the labels, the classifier being configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample. The method can further include providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample.

    Abstract translation: 计算机实现的方法可以包括识别包括特定潜在令人反感的术语的第一组文本样本。 可以获得第一组文本样本的标签,指示特定潜在令人反感的术语是否以令人反感的方式使用。 可以至少基于第一组文本样本和标签对分类器进行训练,分类器被配置为使用与文本样本相关联的一个或多个信号来生成标签,该标签指示文本样本中潜在令人反感的术语是否为 在文字样本中以冒犯性的方式使用。 该方法还可以包括向分类器提供包括特定潜在令人反感的术语的第一文本样本,并且作为响应,从分类器获得一个标签,该标签指示特定潜在令人反感的术语是否以令人反感的方式使用 第一个文本样本。

    Increasing semantic coverage with semantically irrelevant insertions
    16.
    发明授权
    Increasing semantic coverage with semantically irrelevant insertions 有权
    用语义上不相关的插入来增加语义覆盖

    公开(公告)号:US09129598B1

    公开(公告)日:2015-09-08

    申请号:US14671353

    申请日:2015-03-27

    Applicant: Google Inc.

    CPC classification number: G10L15/063 G06F17/2785 G10L15/19 G10L2015/0631

    Abstract: A method includes accessing data specifying a set of actions, each action defining a user device operation and for each action: accessing a corresponding set of command sentences for the action, determining first n-grams in the set of command sentences that are semantically relevant for the action, determining second n-grams in the set of command sentences that are semantically irrelevant for the action, generating a training set of command sentences from the corresponding set of command sentences, the generating the training set of command sentences including removing each second n-gram from each sentence in the corresponding set of command sentences for the action, and generating a command model from the training set of command sentences configured to generate an action score for the action for an input sentence based on: first n-grams for the action, and second n-grams for the action that are also second n-grams for all other actions.

    Abstract translation: 一种方法包括访问指定一组动作的数据,每个动作定义用户设备操作和每个动作:访问用于动作的相应命令语句集合,确定在命令语句集合中与语义相关的第一n-gram 确定所述命令语句集合中与所述动作语义无关的第二n-gram,从相应的命令句集合生成训练集的命令句,生成所述命令句的训练集,包括移除每个第二n -gram从用于该动作的相应命令句集合中的每个句子,以及根据命令句子的训练集合生成命令模型,所述命令语句被配置为基于以下步骤生成用于输入句子的动作的动作得分: 动作和第二个n-gram,也是所有其他动作的第二个n-gram。

    Mining data for natural language system
    17.
    发明授权
    Mining data for natural language system 有权
    挖掘自然语言系统的数据

    公开(公告)号:US09047271B1

    公开(公告)日:2015-06-02

    申请号:US13780757

    申请日:2013-02-28

    Applicant: Google Inc.

    CPC classification number: G06F17/2765 G10L15/1815 G10L15/197 G10L2015/223

    Abstract: A method iteratively processes data for a set of actions, including: for each action: accessing a corresponding set of command sentences for the action, determining first n-grams that are semantically relevant for the action and second n-grams that are semantically irrelevant for the action, and identifying, from a log of command sentences that includes command sentences not included in the corresponding set of command sentences, candidate command sentences that include one first n-gram and a third n-gram that has not yet been determined to be a first n-gram or a second n-gram; for each candidate command sentence, determining each third n-gram that is semantically relevant for an action to be a first n-gram, and determining each third n-gram that is semantically irrelevant for an action to be a second n-gram, and adjusting the corresponding set of command sentences for each action based on the first n-grams and the second n-grams.

    Abstract translation: 一种方法迭代地处理一组动作的数据,包括:对于每个动作:访问用于该动作的相应的一组命令句子,确定与该动作语义相关的第一个n-gram和与语义上不相关的第二个n-gram 从包括不包括在相应的一组命令句子中的命令句子的命令句子的日志中的动作和识别,包括尚未被确定为的第一个n-gram和第三个n-gram的候选命令句子 第一个n-gram或第二个n-gram; 对于每个候选命令句,确定与作为第一个n-gram的动作语义相关的每个第三个n-gram,以及确定对于作为第二个n-gram的动作语义上无关的每个第三个n-gram,以及 基于第一n克和第二n克调整针对每个动作的相应命令句集。

Patent Agency Ranking