System and method for collaborative language translation
    12.
    发明授权
    System and method for collaborative language translation 有权
    用于协同语言翻译的系统和方法

    公开(公告)号:US09323746B2

    公开(公告)日:2016-04-26

    申请号:US13311836

    申请日:2011-12-06

    IPC分类号: G06F17/28

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for presenting a machine translation and alternative translations to a user, where a selection of any particular alternative translation results in the re-ranking of the remaining alternatives. The system then presents these re-ranked alternatives to the user, who can continue proofing the machine translation using the re-ranked alternatives or by typing an improved translation. This process continues until the user indicates that the current portion of the translation is complete, at which point the system moves to the next portion.

    摘要翻译: 本文公开了用于向用户呈现机器翻译和替代翻译的系统,方法和非暂时的计算机可读存储介质,其中任何特定替代翻译的选择导致其余替代方案的重新排序。 然后,该系统将这些重新排列的替代品呈现给用户,他们可以使用重新排列的替代品或通过输入改进的翻译来继续打印机器翻译。 该过程继续,直到用户指示翻译的当前部分完成,在该点系统移动到下一部分。

    System and method for feature-rich continuous space language models
    13.
    发明授权
    System and method for feature-rich continuous space language models 有权
    功能丰富的连续空间语言模型的系统和方法

    公开(公告)号:US09092425B2

    公开(公告)日:2015-07-28

    申请号:US12963161

    申请日:2010-12-08

    IPC分类号: G06F17/27 G06F17/28

    CPC分类号: G06F17/28

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space. The external data can include part-of-speech tags, topic information, word similarity, word relationships, a particular topic, and succeeding parts of speech in a given history.

    摘要翻译: 这里公开了用于预测语言模型的单词概率的系统,方法和非暂时的计算机可读存储介质。 配置为实施该方法的示例性系统接收与该单词序列相关联的单词序列和外部数据序列,并将该单词序列映射到对应于词汇大小的X维向量。 然后系统根据外部数据对每个X维向量进行处理,以产生各自的Y维向量,其中每个Y维向量表示密集的连续空间,并且输出至少一个预测的下一个单词以跟随单词序列 基于相应的Y维向量。 作为二进制稀疏表示的X维向量可以比作为密集连续空间的Y维向量更高的维度。 外部数据可以包括在给定历史中的部分词汇标签,主题信息,单词相似性,单词关系,特定主题以及后续部分语音。

    System and method for building diverse language models
    14.
    发明授权
    System and method for building diverse language models 有权
    建立不同语言模型的系统和方法

    公开(公告)号:US09081760B2

    公开(公告)日:2015-07-14

    申请号:US13042890

    申请日:2011-03-08

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.

    摘要翻译: 本文公开了用于收集网络数据以便创建不同语言模型的系统,方法和非暂时的计算机可读存储介质。 被配置为实践该方法的系统首先通过根据访问策略的互连设备的网络中的诸如通过在计算设备上操作的爬行器来爬行一组文档,其中所述访问策略被配置为专注于新颖区域 目前的语言模型是从以前的爬行周期构建的,通过抓取其词汇被认为可能填补当前语言模型的空白的文档。 来自上一个循环的语言模型可用于指导在以下循环中创建语言模型。 新奇区域可以包括与当前语言模型相比具有高困惑价值的文档。

    System and method of providing machine translation from a source language to a target language
    15.
    发明授权
    System and method of providing machine translation from a source language to a target language 有权
    提供从源语言到目标语言的机器翻译的系统和方法

    公开(公告)号:US08849665B2

    公开(公告)日:2014-09-30

    申请号:US12022819

    申请日:2008-01-30

    IPC分类号: G10L15/00 G10L15/18 G06F17/28

    CPC分类号: G06F17/2827

    摘要: A machine translation method, system for using the method, and computer readable media are disclosed. The method includes the steps of receiving a source language sentence, selecting a set of target language n-grams using a lexical classifier and based on the source language sentence. When selecting the set of target language n-grams, in at least one n-gram, n is greater than 1. The method continues by combining the selected set of target language n-grams as a finite state acceptor (FSA), weighting the FSA with data from the lexical classifier, and generating an n-best list of target sentences from the FSA. As an alternate to using the FSA, N strings may be generated from the n-grams and ranked using a language model. The N strings may be represented by an FSA for efficiency but it is not necessary.

    摘要翻译: 公开了一种机器翻译方法,使用该方法的系统和计算机可读介质。 该方法包括以下步骤:接收源语言句,使用词法分类器并基于源语言句选择一组目标语言n-gram。 当选择一组目标语言n-gram时,在至少一个n-gram中,n大于1.该方法通过将所选择的一组目标语言n-gram组合为有限状态接收器(FSA)来继续加权, FSA与来自词汇分类器的数据,并从FSA生成目标句子的最佳列表。 作为使用FSA的替代方案,可以使用n-gram生成N个字符串,并使用语言模型进行排序。 N字符串可以由FSA表示以提高效率,但不是必需的。

    System and method of generating responses to text-based messages
    17.
    发明授权
    System and method of generating responses to text-based messages 有权
    生成对基于文本的消息的响应的系统和方法

    公开(公告)号:US08296140B2

    公开(公告)日:2012-10-23

    申请号:US13300752

    申请日:2011-11-21

    IPC分类号: G10L15/00

    CPC分类号: G06F17/2785

    摘要: In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a first selected input clause in a sentence in the text-based natural language message. Also, assigning a semantic tag to the first selected input clause and matching the semantic tag to a historical input tag. The historical input tag associated with a first previously generated response clause. Further; generating an output response message based on the historical response clause, the output response message derived from the historical input tag and a second previously generated response clause. The system includes means for performing the method steps.

    摘要翻译: 根据本发明的一个方面,公开了一种用于生成对基于文本的自然语言消息的响应的自动化方法和系统。 该方法包括识别基于文本的自然语言消息中的句子中的第一选择的输入子句。 此外,将语义标签分配给第一选择的输入子句并将语义标签与历史输入标签进行匹配。 与先前生成的第一个响应子句相关联的历史输入标签。 进一步; 基于历史响应子句生成输出响应消息,从历史输入标签导出的输出响应消息和第二个先前生成的响应子句。 该系统包括用于执行方法步骤的装置。

    SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS
    18.
    发明申请
    SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS 有权
    人类语言对话中语言语言理解的系统与方法

    公开(公告)号:US20120239383A1

    公开(公告)日:2012-09-20

    申请号:US13481031

    申请日:2012-05-25

    IPC分类号: G06F17/27 G10L15/00

    摘要: A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.

    摘要翻译: 公开了一种提高口语对话系统中的自动语音识别的系统和方法。 该方法包括将语音识别器输出划分为独立子句,识别每个自包含子句中的对话行为,通过识别当前域对象和/或当前域动作进行限定对话行为,以及确定是否可进行进一步的限定 对于当前域对象和/或当前域操作。 如果可以进一步鉴定,则该方法包括识别与当前域对象和/或当前域操作相关联的另一域操作和/或另一域对象,将另一域操作和/或另一域对象重新分配为当前域操作,以及 /或当前域对象,然后递归地限定新的当前域操作和/或当前对象。 这个过程一直持续到没有什么是剩下的资格。

    SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN
    19.
    发明申请
    SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN 有权
    引导领域实体的系统和方法

    公开(公告)号:US20120221332A1

    公开(公告)日:2012-08-30

    申请号:US13465685

    申请日:2012-05-07

    IPC分类号: G10L15/26

    摘要: Systems, methods, and non-transitory computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization.

    摘要翻译: 用于引用实体的系统,方法和非暂时计算机可读介质。 该方法包括接收在上下文中描述目标实体的句子的特定领域的训练数据,从训练数据中提取讲者历史和视觉上下文,基于说话者的历史,视觉上的至少一个来选择目标实体的属性 上下文和说话人首选项,基于所选择的属性,说话者历史和上下文中的至少一个生成参考目标实体的文本表达,并输出所生成的文本表达。 加权有限状态自动机可以表示域特定训练数据中单词对的部分排序。 加权有限状态自动机可以是扬声器专用或扬声器独立的。 加权有限状态自动机可以包括用于每个可能实现的训练数据的一组加权部分排序。

    System and method of providing a spoken dialog interface to a website
    20.
    发明授权
    System and method of providing a spoken dialog interface to a website 有权
    向网站提供口语对话界面的系统和方法

    公开(公告)号:US08249879B2

    公开(公告)日:2012-08-21

    申请号:US13290501

    申请日:2011-11-07

    IPC分类号: G10L15/18 G06F17/27

    摘要: Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method.

    摘要翻译: 公开了一种用于从网站数据训练口语对话服务组件的系统和方法。 口语对话服务组件通常包括自动语音识别模块,语言理解模块,对话管理模块,语言生成模块和文本到语音模块。 该方法包括将来自与网站相关联的结构化数据库的数据转换为结构化文本数据集和结构化任务知识库,从结构化数据库中提取语言项目,以及使用至少一个结构化文本数据来训练口语对话服务组件 ,结构化任务知识库或语言项目。 该系统包括配置为实现该方法的模块。