-
公开(公告)号:US20080154577A1
公开(公告)日:2008-06-26
申请号:US11645926
申请日:2006-12-26
申请人: Yookyung Kim , Jun Huang , Youssef Billawala
发明人: Yookyung Kim , Jun Huang , Youssef Billawala
IPC分类号: G06F17/28
CPC分类号: G06F17/2827 , G06F17/2775
摘要: Traditional statistical machine translation systems learn all information from a sentence aligned parallel text and are known to have problems translating between structurally diverse languages. To overcome this limitation, the present invention introduces two-level training, which incorporates syntactic chunking into statistical translation. A chunk-alignment step is inserted between the sentence-level and word-level training, which allows differing training for these two sources of information in order to learn lexical properties from the aligned chunks and learn structural properties from chunk sequences. The system consists of a linguistic processing step, two level training, and a decoding step which combines chunk translations of multiple sources and multiple language models.
摘要翻译: 传统的统计机器翻译系统从句子对齐的并行文本中学习所有信息,并且已知在不同结构语言之间翻译有问题。 为了克服这个限制,本发明引入了将句法分块结合到统计翻译中的两级训练。 在句子级和词级训练之间插入块对齐步骤,其允许针对这两个信息源的不同训练,以便从对齐的块学习词汇属性并从块序列学习结构特性。 该系统由语言处理步骤,两级训练和解码步骤组成,该步骤结合了多个来源和多种语言模型的块转换。
-
公开(公告)号:US08583416B2
公开(公告)日:2013-11-12
申请号:US11965711
申请日:2007-12-27
申请人: Jun Huang , Yookyung Kim , Youssef Billawala , Farzad Ehsani , Demitrios Master
发明人: Jun Huang , Yookyung Kim , Youssef Billawala , Farzad Ehsani , Demitrios Master
CPC分类号: G10L15/1822 , G10L15/1815
摘要: The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
摘要翻译: 传统语音识别系统(应用于信息提取或翻译)的性能随着更大的域大小,稀缺的训练数据以及噪声环境条件而显着降低。 本发明通过引入一种新颖的预测特征提取方法来缓解这些问题,该方法结合语言和统计信息来表示以噪声源语言嵌入的信息。 预测特征与文本分类器组合,将嘈杂的文本映射到语义或功能相似的组之一。 分类器使用的特征可以是语法,语义和统计。
-
公开(公告)号:US20090171662A1
公开(公告)日:2009-07-02
申请号:US11965711
申请日:2007-12-27
申请人: Jun Huang , Yookyung Kim , Youssef Billawala , Farzad Ehsani , Demitrios Master
发明人: Jun Huang , Yookyung Kim , Youssef Billawala , Farzad Ehsani , Demitrios Master
IPC分类号: G10L15/00
CPC分类号: G10L15/1822 , G10L15/1815
摘要: The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
摘要翻译: 传统语音识别系统(应用于信息提取或翻译)的性能随着更大的域大小,稀缺的训练数据以及噪声环境条件而显着降低。 本发明通过引入一种新颖的预测特征提取方法来缓解这些问题,该方法结合语言和统计信息来表示以噪声源语言嵌入的信息。 预测特征与文本分类器组合,将嘈杂的文本映射到语义或功能相似的组之一。 分类器使用的特征可以是语法,语义和统计。
-
公开(公告)号:US20080133245A1
公开(公告)日:2008-06-05
申请号:US11633859
申请日:2006-12-04
申请人: Guillaume Proulx , Youssef Billawala , Elaine Drom , Farzad Ehsani , Yookyung Kim , Demitrios Master
发明人: Guillaume Proulx , Youssef Billawala , Elaine Drom , Farzad Ehsani , Yookyung Kim , Demitrios Master
CPC分类号: G06F17/2872 , G06F17/2818 , G10L13/00 , G10L15/26
摘要: The present invention disclose modular speech-to-speech translation systems and methods that provide adaptable platforms to enable verbal communication between speakers of different languages within the context of specific domains. The components of the preferred embodiments of the present invention includes: (1) speech recognition; (2) machine translation; (3) N-best merging module; (4) verification; and (5) text-to-speech. Characteristics of the speech recognition module here are that the modules are structured to provide N-best selections and multi-stream processing, where multiple speech recognition engines may be active at any one time. The N-best lists from the one or more speech recognition engines may be handled either separately or collectively to improve both recognition and translation results. A merge module is responsible for integrating the N-best outputs of the translation engines along with confidence/translation scores to create a ranked list or recognition-translation pairs.
摘要翻译: 本发明公开了提供适应性平台的模块化语音到语音翻译系统和方法,以使得能够在特定域的上下文内的不同语言的说话者之间进行口头通信。 本发明的优选实施例的组件包括:(1)语音识别; (2)机器翻译; (3)最佳合并模块; (4)验证; (5)文字转语音。 这里的语音识别模块的特征在于,模块被构造成提供N个最佳选择和多流处理,其中多个语音识别引擎可以在任何一个时间处于活动状态。 来自一个或多个语音识别引擎的N最佳列表可以单独处理或集体处理以改善识别和翻译结果。 合并模块负责整合翻译引擎的N最佳输出以及置信/翻译分数,以创建排名列表或识别 - 转换对。
-
公开(公告)号:US07958109B2
公开(公告)日:2011-06-07
申请号:US12367225
申请日:2009-02-06
申请人: Yi-An Lin , Youssef Billawala , Kevin Haas , Jan Pfeifer
发明人: Yi-An Lin , Youssef Billawala , Kevin Haas , Jan Pfeifer
IPC分类号: G06F17/30
CPC分类号: G06F17/30719 , G06F17/30867
摘要: Techniques for providing useful information to a user in response to a search query are provided. Based on the search query, one or more potential intents of the user are identified and a plurality of matching resources are identified. For at least one matching resource, a particular abstract template is selected based on the one or more potential intents. Each abstract (a) corresponds to a different intent than any other intent to which any other abstract template of the plurality of abstract templates corresponds, and (b) dictates a different manner of displaying information about a matching resource than any other manner of displaying dictated by any other abstract template of the plurality of abstract templates. A search results page is generated and sent to the user. The search results page includes an abstract for the at least one matching resource. The abstract is displayed based on the particular abstract template.
摘要翻译: 提供了用于响应于搜索查询向用户提供有用信息的技术。 基于搜索查询,识别用户的一个或多个潜在意图,并且识别多个匹配资源。 对于至少一个匹配资源,基于一个或多个潜在意图来选择特定抽象模板。 每个抽象(a)对应于与多个抽象模板中的任何其他抽象模板对应的任何其他意图不同的意图,并且(b)规定显示关于匹配资源的信息的不同方式,而不是显示所指示的任何其他方式 通过多个抽象模板的任何其他抽象模板。 生成搜索结果页并将其发送给用户。 搜索结果页面包括用于至少一个匹配资源的摘要。 摘要基于特定的抽象模板显示。
-
6.
公开(公告)号:US20120047131A1
公开(公告)日:2012-02-23
申请号:US12861774
申请日:2010-08-23
IPC分类号: G06F17/30
CPC分类号: G06F17/30696 , G06F17/30716
摘要: An information retrieval system and computer-based method provide constructing a title for a search result summary of a document through title synthesis, wherein the title is suitable for use in assessing the relevance of the summarized document to a query. In one embodiment, the system obtains meaningful keywords or key phrases (title components) about the document; and classifies each title components into one or more of a plurality of pre-established title component classes. The title components may be automatically obtained for the document from available sources either before or at the time the document is made available for indexing by the system. When a query is input to the system to which the document is relevant, the system constructs a title for the document by arranging title components selected from title component classes, to maximize a title utility function. The title utility function may be a query-dependent grade. In addition to the query, the title utility function may also account for constraints under which the title is to be presented to a user of the system.
摘要翻译: 信息检索系统和基于计算机的方法提供通过标题综合为文档的搜索结果摘要构建标题,其中标题适合用于评估汇总的文档与查询的相关性。 在一个实施例中,系统获得关于文档的有意义的关键词或关键短语(标题组件); 并且将每个标题组件分类成多个预先建立的标题组件类中的一个或多个。 在文档可用于系统索引可用之前或之时,可以从可用源自动获得标题组件。 当将查询输入到文档相关的系统时,系统通过排列从标题组件类中选择的标题组件来构建文档的标题,以最大化标题效用函数。 标题效用函数可以是依赖于查询的等级。 除了查询之外,标题效用函数还可以考虑将标题提交给系统的用户的约束。
-
公开(公告)号:US08504567B2
公开(公告)日:2013-08-06
申请号:US12861774
申请日:2010-08-23
IPC分类号: G06F17/30
CPC分类号: G06F17/30696 , G06F17/30716
摘要: An information retrieval system and computer-based method provide constructing a title for a search result summary of a document through title synthesis, wherein the title is suitable for use in assessing the relevance of the summarized document to a query. Meaningful keywords or key phrases (title components) about the document are Obtained. The title components are classified into pre-established title component classes. When a query is input to which the document is relevant, a title for the document is constructed by arranging title components selected from title component classes to maximize a title utility function. The title utility function may be a query-dependent grade. In addition to the query, the title utility function may also account for constraints under which the title is to be presented to a user.
摘要翻译: 信息检索系统和基于计算机的方法提供通过标题综合为文档的搜索结果摘要构建标题,其中标题适合用于评估汇总的文档与查询的相关性。 获取关于文档的有意义的关键词或关键短语(标题组件)。 标题组件分类为预先建立的标题组件类。 当输入与文档相关的查询时,通过排列从标题组件类中选择的标题组件来构建文档的标题,以最大化标题效用函数。 标题效用函数可以是依赖于查询的等级。 除了查询之外,标题效用函数还可以考虑将标题提交给用户的约束。
-
-
-
-
-
-