System and method for tokenization of text using classifier models
    1.
    发明授权
    System and method for tokenization of text using classifier models 有权
    使用分类器模型对文本进行标记化的系统和方法

    公开(公告)号:US07937263B2

    公开(公告)日:2011-05-03

    申请号:US11001654

    申请日:2004-12-01

    IPC分类号: G06F17/27 G06F17/20

    CPC分类号: G06F17/277

    摘要: The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.

    摘要翻译: 本发明涉及用于文本的标记化的系统和方法。 特征化器可以被配置为接收输入文本并将输入文本转换成令牌。 根据本发明的一个方面,令牌可以仅包括一种类型的字符,从由字母,数字和标点符号组成的组中选择的字符。 标记器还可以包括分类器。 分类器可以被配置为从成色器接收令牌。 此外,分类器可以被配置为分析从特征化器接收的令牌以确定令牌是否可以使用预分类器输入到预定分类模型中。 如果其中一个令牌通过预分类器,则使用预定分类模型对令牌进行分类。 另外,根据本发明的第一方面,标记器还可以包括终结器。 终结器可以被配置为接收令牌,并且可以被配置为产生最终输出。

    System and method for accented modification of a language model
    2.
    发明授权
    System and method for accented modification of a language model 有权
    语言模型重音修改的系统和方法

    公开(公告)号:US07315811B2

    公开(公告)日:2008-01-01

    申请号:US11007626

    申请日:2004-12-08

    IPC分类号: G06F17/27 G06F17/21 G10L15/00

    CPC分类号: G10L15/183

    摘要: A system and method for a speech recognition technology that allows language models for a particular language to be customized through the addition of alternate pronunciations that are specific to the accent of the dictator, for a subset of the words in the language model. The system includes the steps of identifying the pronunciation differences that are best handled by modifying the pronunciations of the language model, identifying target words in the language model for pronunciation modification, and creating a accented speech file used to modify the language model.

    摘要翻译: 用于语音识别技术的系统和方法,其允许通过添加特定于独裁者的口音的替代发音来为特定语言的语言模型,对于语言模型中的单词的子集来定制语言模型。 该系统包括以下步骤:通过修改语言模型的发音,识别用于发音修改的语言模型中的目标词,以及创建用于修改语言模型的重音语音文件来识别最佳处理的发音差异。