Automatic Editing Using Probabilistic Word Substitution Models
    1.
    发明申请
    Automatic Editing Using Probabilistic Word Substitution Models 有权
    使用概率词替换模型进行自动编辑

    公开(公告)号:US20080243500A1

    公开(公告)日:2008-10-02

    申请号:US11693961

    申请日:2007-03-30

    IPC分类号: G10L15/00

    CPC分类号: G10L15/26 G06F17/2715

    摘要: An input sequence of unstructured speech recognition text is transformed into output structured document text. A probabilistic word substitution model is provided which establishes association probabilities indicative of target structured document text correlating with source unstructured speech recognition text. The input sequence of unstructured speech recognition text is looked up in the word substitution model to determine likelihoods of the represented structured document text corresponding to the text in the input sequence. Then, a most likely sequence of structured document text is generated as an output.

    摘要翻译: 非结构化语音识别文本的输入序列被转换为输出结构化文档文本。 提供概率词替代模型,其建立指示与源非结构化语音识别文本相关联的目标结构化文档文本的关联概率。 在单词替换模型中查找非结构化语音识别文本的输入序列,以确定与输入序列中的文本相对应的表示的结构化文档文本的可能性。 然后,生成最可能的结构化文档文本序列作为输出。

    Automatic orthographic transformation of a text stream
    2.
    发明授权
    Automatic orthographic transformation of a text stream 有权
    文本流的自动正交变换

    公开(公告)号:US06490549B1

    公开(公告)日:2002-12-03

    申请号:US09539066

    申请日:2000-03-30

    IPC分类号: G06F1721

    CPC分类号: G06F17/273

    摘要: A method is given for automatically rewriting orthography of a stream of text words, for example, automatically and properly capitalizing words in the stream. If a word in the stream has an entry in an orthography rewrite lexicon, the word is automatically replaced with an orthographically rewritten form of the word from the orthography rewrite lexicon. In addition, selected words in the stream are compared to a plurality of features weighted by a maximum entropy-based algorithm, to automatically determine whether to rewrite orthography of any of the selected words.

    摘要翻译: 给出了一种用于自动重写文本流的正字法的方法,例如,自动和适当地在流中使用大写字母。 如果流中的一个词在正字法重写词典中有一个条目,则该词将自动替换为正字法重写词典中的词的正交重写形式。 此外,将流中的选定词与由基于最大熵的算法加权的多个特征进行比较,以自动确定是否重写任何所选择的单词的正字法。

    Automatic editing using probabilistic word substitution models
    3.
    发明授权
    Automatic editing using probabilistic word substitution models 有权
    使用概率词替代模型进行自动编辑

    公开(公告)号:US07813929B2

    公开(公告)日:2010-10-12

    申请号:US11693961

    申请日:2007-03-30

    IPC分类号: G10L15/18

    CPC分类号: G10L15/26 G06F17/2715

    摘要: An input sequence of unstructured speech recognition text is transformed into output structured document text. A probabilistic word substitution model is provided which establishes association probabilities indicative of target structured document text correlating with source unstructured speech recognition text. The input sequence of unstructured speech recognition text is looked up in the word substitution model to determine likelihoods of the represented structured document text corresponding to the text in the input sequence. Then, a most likely sequence of structured document text is generated as an output.

    摘要翻译: 非结构化语音识别文本的输入序列被转换为输出结构化文档文本。 提供概率词替代模型,其建立指示与源非结构化语音识别文本相关联的目标结构化文档文本的关联概率。 在单词替换模型中查找非结构化语音识别文本的输入序列,以确定与输入序列中的文本相对应的表示的结构化文档文本的可能性。 然后,生成最可能的结构化文档文本序列作为输出。

    Word classing for language modeling
    4.
    发明授权
    Word classing for language modeling 有权
    用于语言建模的词分类

    公开(公告)号:US09367526B1

    公开(公告)日:2016-06-14

    申请号:US13190891

    申请日:2011-07-26

    摘要: A language processing application employs a classing function optimized for the underlying production application context for which it is expected to process speech. A combination of class based and word based features generates a classing function optimized for a particular production application, meaning that a language model employing the classing function uses word classes having a high likelihood of accurately predicting word sequences encountered by a language model invoked by the production application. The classing function optimizes word classes by aligning the objective of word classing with the underlying language processing task to be performed by the production application. The classing function is optimized to correspond to usage in the production application context using class-based and word-based features by computing a likelihood of a word in an n-gram and a frequency of a word within a class of the n-gram.

    摘要翻译: 语言处理应用程序使用针对其预期处理语音的底层生产应用程序环境进行优化的分类功能。 基于类和基于字的特征的组合产生针对特定生产应用优化的分类功能,这意味着采用分类函数的语言模型使用具有准确预测由生产调用的语言模型遇到的单词序列的高似然性的单词类 应用。 分类函数通过将单词分类的目标与生产应用程序执行的底层语言处理任务进行对齐来优化单词类。 通过计算n-gram中的单词和n-gram类中的单词的可能性,使用基于类和基于单词的特征来优化分类功能以对应于生产应用上下文中的使用。