System and method for tokenization of text
    1.
    发明申请
    System and method for tokenization of text 有权
    文本标记化的系统和方法

    公开(公告)号:US20060116862A1

    公开(公告)日:2006-06-01

    申请号:US11001654

    申请日:2004-12-01

    IPC分类号: G06F17/20

    CPC分类号: G06F17/277

    摘要: The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.

    摘要翻译: 本发明涉及用于文本的标记化的系统和方法。 特征化器可以被配置为接收输入文本并将输入文本转换成令牌。 根据本发明的一个方面,令牌可以仅包括一种类型的字符,从由字母,数字和标点符号组成的组中选择的字符。 标记器还可以包括分类器。 分类器可以被配置为从成色器接收令牌。 此外,分类器可以被配置为分析从特征化器接收的令牌以确定令牌是否可以使用预分类器输入到预定分类模型中。 如果其中一个令牌通过预分类器,则使用预定分类模型对令牌进行分类。 另外,根据本发明的第一方面,标记器还可以包括终结器。 终结器可以被配置为接收令牌,并且可以被配置为产生最终输出。

    System and method for customizing speech recognition input and output
    2.
    发明申请
    System and method for customizing speech recognition input and output 有权
    用于定制语音识别输入和输出的系统和方法

    公开(公告)号:US20050114122A1

    公开(公告)日:2005-05-26

    申请号:US10951291

    申请日:2004-09-27

    摘要: 5 A system and method may be disclosed for facilitating the site-specific customization of automated speech recognition systems by providing a customization client for site-specific individuals to update and modify language model input files and post processor input files. In customizing the input files, the customization client may provide a graphical user interface for facilitating the inclusion of words specific to a particular site. The customization client may also be configured to provide the user with a series of formatting rules for controlling the appearance and format of a document transcribed by an automated speech recognition system.

    摘要翻译: 可以公开一种系统和方法,用于通过为站点特定个人提供定制客户端来更新和修改语言模型输入文件和后处理器输入文件来促进自动语音识别系统的站点特定定制。 在定制输入文件时,定制客户端可以提供图形用户界面,以便于包含特定于特定站点的单词。 定制客户端还可以被配置为向用户提供用于控制由自动语音识别系统转录的文档的外观和格式的一系列格式化规则。

    System and method for accented modification of a language model
    3.
    发明申请
    System and method for accented modification of a language model 有权
    语言模型重音修改的系统和方法

    公开(公告)号:US20050165602A1

    公开(公告)日:2005-07-28

    申请号:US11007626

    申请日:2004-12-07

    IPC分类号: G06F17/27

    CPC分类号: G10L15/183

    摘要: A system and method for a speech recognition technology that allows language models for a particular language to be customized through the addition of alternate pronunciations that are specific to the accent of the dictator, for a subset of the words in the language model. The system includes the steps of identifying the pronunciation differences that are best handled by modifying the pronunciations of the language model, identifying target words in the language model for pronunciation modification, and creating a accented speech file used to modify the language model.

    摘要翻译: 用于语音识别技术的系统和方法,其允许通过添加特定于独裁者的口音的替代发音来为特定语言的语言模型,对于语言模型中的单词的子集来定制语言模型。 该系统包括以下步骤:通过修改语言模型的发音,识别用于发音修改的语言模型中的目标词,以及创建用于修改语言模型的重音语音文件来识别最佳处理的发音差异。

    System and method for adaptive automatic error correction

    公开(公告)号:US20060235687A1

    公开(公告)日:2006-10-19

    申请号:US11105905

    申请日:2005-04-14

    IPC分类号: G10L15/00

    摘要: A method for adaptive automatic error and mismatch correction is disclosed for use with a system having an automatic error and mismatch correction learning module, an automatic error and mismatch correction model, and a classifier module. The learning module operates by receiving pairs of documents, identifying and selecting effective candidate errors and mismatches, and generating classifiers corresponding to these selected errors and mismatches. The correction model operates by receiving a string of interpreted speech into the automatic error and mismatch correction module, identifying target tokens in the string of interpreted speech, creating a set of classifier features according to requirements of the automatic error and mismatch correction model, comparing the target tokens against the classifier features to detect errors and mismatches in the string of interpreted speech, and modifying the string of interpreted speech based upon the classifier features.