Statistical translation system with features based on phrases or groups
of words
    2.
    发明授权
    Statistical translation system with features based on phrases or groups of words 失效
    具有基于短语或词组的特征的统计翻译系统

    公开(公告)号:US5991710A

    公开(公告)日:1999-11-23

    申请号:US859586

    申请日:1997-05-20

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2818

    摘要: A system for translating a first word set in a source language into a second word set in a target language, the system comprising: input means for inputting the first word set into the system; tagging means for tagging the first word set input to the system so as to at least substantially reduce non-essential variability in the first word set; translation means including a single a posteriori conditional probability model and a target candidate store for storing target language candidate word sets, wherein the translation means employs the single model to evaluate the target language candidate word sets in order to select the target language candidate word set having a best score with respect to the first word set; and output means for outputting the best scoring target language candidate word set as the second word set in the target language.

    摘要翻译: 一种用于将源语言中的第一单词集合翻译为以目标语言集合的第二单词的系统,所述系统包括:输入装置,用于将所述第一单词集合输入到所述系统中; 标记装置,用于标记输入到系统的第一单词集,以便至少基本上减少第一单词集中的非本质可变性; 包括单个后验条件概率模型和用于存储目标语言候选词组的目标候选存储器的翻译装置,其中所述翻译装置使用所述单个模型来评估所述目标语言候选词组,以便选择具有 相对于第一个单词集的最佳得分; 以及输出装置,用于输出作为目标语言中设置的第二单词设置的最佳得分目标语言候选词。

    Machine Translation with Side Information
    3.
    发明申请
    Machine Translation with Side Information 有权
    机器翻译与侧面信息

    公开(公告)号:US20110282648A1

    公开(公告)日:2011-11-17

    申请号:US12779751

    申请日:2010-05-13

    IPC分类号: G06F17/28 G06F17/30 G06F7/00

    CPC分类号: G06F17/2818

    摘要: A method of identifying and using side information available to statistical machine translation systems within an enterprise setting, the method including extracting user-specific interaction and non-interaction-based information from at least one corresponding database within the enterprise for each of a plurality of users, aggregating the user-specific interaction and non-interaction based information from a plurality of users, by using a processor on a computer, to tune and adapt background translation and language models, and updating all relevant models within the enterprise after user activity based on the tuned and adapted translation and language models.

    摘要翻译: 一种识别和使用可用于企业设置内的统计机器翻译系统的侧面信息的方法,所述方法包括从多个用户中的每一个的企业内的至少一个对应的数据库中提取用户特定交互和非基于交互的信息 ,通过使用计算机上的处理器来聚合来自多个用户的用户特定交互和非基于交互的信息,以调整和适应背景翻译和语言模型,以及在基于用户活动的用户活动之后更新企业内的所有相关模型 调整和适应的翻译和语言模型。

    Machine translation with side information
    4.
    发明授权
    Machine translation with side information 有权
    机器翻译与侧面信息

    公开(公告)号:US08768686B2

    公开(公告)日:2014-07-01

    申请号:US12779751

    申请日:2010-05-13

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2818

    摘要: A method of identifying and using side information available to statistical machine translation systems within an enterprise setting, the method including extracting user-specific interaction and non-interaction-based information from at least one corresponding database within the enterprise for each of a plurality of users, aggregating the user-specific interaction and non-interaction based information from a plurality of users, by using a processor on a computer, to tune and adapt background translation and language models, and updating all relevant models within the enterprise after user activity based on the tuned and adapted translation and language models.

    摘要翻译: 一种识别和使用可用于企业设置内的统计机器翻译系统的侧面信息的方法,所述方法包括从多个用户中的每一个的企业内的至少一个对应的数据库中提取用户特定交互和非基于交互的信息 ,通过使用计算机上的处理器来聚合来自多个用户的用户特定交互和非基于交互的信息,以调整和适应背景翻译和语言模型,以及在基于用户活动的用户活动之后更新企业内的所有相关模型 调整和适应的翻译和语言模型。

    Speech recognition models combining gender-dependent and
gender-independent phone states and using phonetic-context-dependence
    5.
    发明授权
    Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence 失效
    语音识别模型结合了性别依赖和与性别无关的手机状态,并使用语音背景相关性

    公开(公告)号:US5953701A

    公开(公告)日:1999-09-14

    申请号:US10466

    申请日:1998-01-22

    IPC分类号: G10L5/06

    CPC分类号: G10L15/07 G10L15/142

    摘要: A method of gender dependent speech recognition includes the steps of identifying phone state models common to both genders, identifying gender specific phone state models, identifying a gender of a speaker and recognizing acoustic data from the speaker. A method of constructing a gender-dependent speech recognition model includes the steps of providing training data of a known gender, aligning the training data, tagging the training data with a gender to create gender-tagged data, determining a gender question at a node to determine gender dependence of the gender-tagged data, determining a phonetic context question at the node to determine phonetic context dependence of the gender-tagged data, determining a highest value of an evaluation function between the gender dependence and the phonetic context dependence to determine which dependence is a dominant dependence, splitting the data of the dominant dependence into child nodes according to likelihood criteria, comparing the highest value with a threshold value to determine if additional splitting is necessary, repeating theses steps for each child node until the highest value is below the threshold value and counting the nodes having gender dependence to determine an overall gender dependence level. A gender-dependent speech recognition system includes an input device for inputting speech to a preprocessor. The preprocessor converts the speech into acoustic data, and a processor for identifies gender-dependent phone state models and phone state modes common to both genders. The phone state models are stored in a memory device wherein the processor recognizes the speech in accordance with the phone state models.

    摘要翻译: 一种性别依赖性语音识别的方法包括识别两性的共同的电话状态模型,识别性别特定的电话状态模型,识别说话人的性别以及从说话者识别声学数据的步骤。 一种构建性别相关语音识别模型的方法包括以下步骤:提供已知性别的训练数据,对准训练数据,将训练数据与性别标记以产生性别标记的数据,在节点处确定性别问题 确定性别标签数据的性别依赖性,确定节点处的语音上下文问题以确定性别标记数据的语音上下文依赖性,确定性别依赖性和语音上下文依赖性之间的评估函数的最高值,以确定哪个 依赖性是主要依赖,根据似然准则将主要依赖的数据分解为子节点,将最高值与阈值进行比较,以确定是否需要额外的分割,重复每个子节点的这些步骤,直到最高值低于 阈值并计算具有性别依赖性的节点以确定整体性别 依赖度。 性别依赖语音识别系统包括用于向预处理器输入语音的输入装置。 预处理器将语音转换为声学数据,以及用于识别性别相关电话状态模型和两种性别共同的电话状态模式的处理器。 电话状态模型存储在存储设备中,其中处理器根据电话状态模型识别语音。

    Fast vocabulary independent method and apparatus for spotting words in
speech
    6.
    发明授权
    Fast vocabulary independent method and apparatus for spotting words in speech 失效
    快速词汇独立的方法和设备,用于在言语中发现单词

    公开(公告)号:US6073095A

    公开(公告)日:2000-06-06

    申请号:US950621

    申请日:1997-10-15

    摘要: A fast vocabulary independent method for spotting words in speech utilizes a preprocessing step and a coarse-to-detailed search strategy for spotting a word/phone sequence in speech. The preprocessing includes a Viterbi-beam phone level decoding using a tree-based phone language model. The coarse search matches phone-ngrams to identify regions of speech as putative word hits, and the detailed search performs an acoustic match at the putative hits with a model of the given word included in the vocabulary of the recognizer.

    摘要翻译: 用于在语音中发现单词的快速词汇独立方法利用预处理步骤和用于在语音中发现单词/电话序列的粗略到详细的搜索策略。 预处理包括使用基于树的手机语言模型的维特比波束电话级解码。 粗略搜索匹配电话号码以将语音区域识别为假定词命中,并且详细搜索在推定命中与在识别器的词汇表中包括的给定单词的模型进行声匹配。

    Method and Apparatus for Annotating a Document
    7.
    发明申请
    Method and Apparatus for Annotating a Document 审中-公开
    注释文件的方法和装置

    公开(公告)号:US20080222511A1

    公开(公告)日:2008-09-11

    申请号:US12061244

    申请日:2008-04-02

    IPC分类号: G06F17/00

    摘要: Methods and apparatus are provided for annotating documents with one or more of entities, events and relations. Documents are annotated by presenting the document to a user; presenting the user with a list of possible entity types, wherein the list of possible entity types is configurable; and obtaining at least one mention annotation that associates a selected phrase in the document with one of the possible entity types. The selected phrase can be presented to the user, for example, based on one or more presentation rules associated with the associated entity type. The method can be implemented, for example, in a client-server configuration where a browser communicates with a remote server. A document can also be annotated by presenting the document to a user; presenting the user with a list of possible relation types, wherein the list of possible relation types is configurable; receiving at least two mention annotations from the user that each associate a selected phrase in the document with a entity type; and obtaining a relation annotation, wherein the relation annotation specifies a relation type between the at least two mention annotations.

    摘要翻译: 提供了用于用一个或多个实体,事件和关系来注释文档的方法和装置。 通过向用户呈现文档来注释文档; 向用户呈现可能的实体类型的列表,其中可能的实体类型的列表是可配置的; 以及获得将文档中的所选短语与可能的实体类型之一相关联的至少一个提及注释。 所选择的短语可以例如基于与关联的实体类型相关联的一个或多个呈现规则来呈现给用户。 该方法可以例如在浏览器与远程服务器通信的客户端 - 服务器配置中实现。 还可以通过向用户呈现文档来注释文档; 向用户呈现可能的关系类型的列表,其中可能的关系类型的列表是可配置的; 从所述用户接收至少两个提及注释,每个所述注释与所述文档中的所选短语与实体类型相关联; 以及获得关系注释,其中所述关系注释指定所述至少两个提及注释之间的关系类型。

    Statistical language model for inflected languages
    8.
    发明授权
    Statistical language model for inflected languages 失效
    变形语言的统计语言模型

    公开(公告)号:US5835888A

    公开(公告)日:1998-11-10

    申请号:US662726

    申请日:1996-06-10

    IPC分类号: G10L15/14 G10L15/18 G06F17/28

    CPC分类号: G10L15/18 G10L15/14 G10L15/19

    摘要: A statistical language model for inflected languages, having very large vocabularies, is generated by splitting words into stems, prefixes and endings, and deriving trigrams for the stems, ending and prefixes. The statistical dependence of endings and prefixes from each stem is also obtained, and the resulting language model is a weighted sum of these scores.

    摘要翻译: 具有非常大的词汇的变形语言的统计语言模型是通过将单词分成词干,前缀和结尾生成的,并且为词干,结尾和前缀导出三元组。 也可以得到每个句子的结尾和前缀的统计依赖关系,得到的语言模型是这些分数的加权和。