Technique for selective use of Gaussian kernels and mixture component
weights of tied-mixture hidden Markov models for speech recognition
    1.
    发明授权
    Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition 失效
    用于选择性使用高斯内核的技术和用于语音识别的绑定混合隐马尔可夫模型的混合分量权重

    公开(公告)号:US6009390A

    公开(公告)日:1999-12-28

    申请号:US927883

    申请日:1997-09-11

    IPC分类号: G10L15/14 G10L7/08

    CPC分类号: G10L15/144

    摘要: In a speech recognition system, tied-mixture hidden Markov models (HMMs) are used to match, in the maximum likelihood sense, the phonemes of spoken words given the acoustic input thereof. In a well known manner, such speech recognition requires computation of state observation likelihoods (SOLs). Because of the use of HMMs, each SOL computation involves a substantial number of Gaussian kernels and mixture component weights. In accordance with the invention, the number of Gaussian kernels is cut down to reduce the computational complexity and increase the efficiency of memory access to the kernels. For example, only the non-zero mixture component weights and the Gaussian kernels associated therewith are considered in the SOL computation. In accordance with an aspect of the invention, only a subset of the Gaussian kernels of significant values, regardless of the values of the associated mixture component weights, are considered in the SOL computation. In accordance with another aspect of the invention, at least some of the mixture component weights are quantized to reduce memory space needed to store them. As such, the computational complexity and memory access efficiency are further improved.

    摘要翻译: 在语音识别系统中,绑定混合隐马尔可夫模型(HMM)用于在最大似然意义上匹配给定其声输入的口语字的音素。 以众所周知的方式,这种语音识别需要计算状态观察可能性(SOLs)。 由于使用HMM,每个SOL计算涉及大量高斯核和混合分量权重。 根据本发明,削减高斯内核的数量以减少计算复杂度并提高对内核的存储器访问的效率。 例如,在SOL计算中仅考虑非零混合分量权重和与其相关联的高斯内核。 根据本发明的一个方面,在SOL计算中仅考虑与有关混合分量权重值相关的有效值的高斯核的子集。 根据本发明的另一方面,至少部分混合组分权重被量化以减少存储它们所需的存储空间。 因此,计算复杂度和存储器访问效率进一步提高。

    Technique for effectively recognizing sequence of digits in voice dialing
    2.
    发明授权
    Technique for effectively recognizing sequence of digits in voice dialing 失效
    有效识别语音拨号中数字序列的技术

    公开(公告)号:US5995926A

    公开(公告)日:1999-11-30

    申请号:US897806

    申请日:1997-07-21

    IPC分类号: G10L15/18 H04M3/42 G10L5/00

    CPC分类号: G10L15/197 H04M3/42204

    摘要: In a speech recognition system for performing voice dialing, an inventive connected digit recognizer is employed to recognize a sequence of spoken digits. The inventive recognizer generates the maximum-likelihood digit sequence corresponding to the spoken sequence in accordance with the Viterbi algorithm. However, unlike a prior art connected digit recognizer, the inventive recognizer does not assume that a digit model in a sequence can be followed by any digit model with equal probability. Rather, the inventive recognizer takes into account, for each digit model being decided on, a conditional probability that that digit model would follow a given digit model preceding thereto.

    摘要翻译: 在用于执行语音拨号的语音识别系统中,本发明的连接数字识别器用于识别口语数字序列。 本发明的识别器根据维特比算法生成对应于口语序列的最大似然数字序列。 然而,与现有技术的连接的数字识别器不同,本发明的识别器不认为序列中的数字模型可以跟随具有相等概率的任何数字模型。 相反,本发明的识别器考虑到所确定的每个数字模型,该数字模型将遵循之前的给定数字模型的条件概率。

    Intonation transformation for speech therapy and the like
    3.
    发明授权
    Intonation transformation for speech therapy and the like 有权
    语音治疗的语调转换等

    公开(公告)号:US07373294B2

    公开(公告)日:2008-05-13

    申请号:US10438642

    申请日:2003-05-15

    IPC分类号: G10L11/04 G10L21/00

    摘要: The intonation of speech is modified by an appropriate combination of resampling and time-domain harmonic scaling. Resampling increases (upsampling) or decreases (downsampling) the number of data points in a signal. Harmonic scaling adds or removes pitch cycles to or from a signal. The pitch of a speech signal can be increased by combining downsampling with harmonic scaling that adds an appropriate number of pitch cycles. Alternatively, pitch can be decreased by combining upsampling with harmonic scaling that removes an appropriate number of pitch cycles. The present invention can be implemented in an automated speech-therapy tool that is able to modify the intonation of prerecorded reference speech signals for playback to a user to emphasize the correct pronunciation by increasing the pitch of selected portions of words or phrases that the user had previously mispronounced.

    摘要翻译: 通过重采样和时域谐波缩放的适当组合来修改语音的语调。 重采样增加(上采样)或降低(下采样)信号中数据点的数量。 谐波缩放可以增加或去除信号的音调周期。 语音信号的音调可以通过将下采样与谐波缩放相结合来增加,该谐波缩放增加适当数量的音调周期。 或者,可以通过组合上采样与谐波缩放来去除适当数量的音调周期来减小音调。 本发明可以在自动言语治疗工具中实现,该自动化语音治疗工具能够通过增加用户具有的单词或短语的选定部分的音调来修改预先记录的参考语音信号的音调以便播放给用户以强调正确的发音 以前是错误的。

    Speech recognition
    5.
    发明授权
    Speech recognition 失效
    语音识别

    公开(公告)号:US6138095A

    公开(公告)日:2000-10-24

    申请号:US145934

    申请日:1998-09-03

    IPC分类号: G10L15/08 G10L15/00 G10L15/20

    CPC分类号: G10L15/08

    摘要: Speech recognition in which the log probabilities of the null and alternative hypothesis are computed for an input speech sample by comparison with specific stored speech vocabularies/grammars and with general speech characteristics. The difference in probabilities is normalized by the magnitude of the null hypothesis to derive a likelihood factor which is compared with a rejection threshold that is utterance-length dependent. Advantageously, a high-order polynomial representation of the rejection threshold length dependency may be simplified by a series of piece-wise constants which are stored as rejection thresholds to be selected in accordance with the length of the input speech sample.

    摘要翻译: 语音识别,其中通过与特定存储的语音词汇/语法和一般语音特征进行比较来计算输入语音样本的空值和替代假设的对数概率。 概率差异通过零假设的幅度进行归一化,以导出与发音长度相关的拒绝阈值进行比较的似然因子。 有利地,可以通过一系列分段常数来简化拒绝阈值长度依赖性的高阶多项式表示,其被存储为根据输入语音样本的长度来选择的拒绝阈值。

    Method and apparatus for providing an interactive language tutor
    6.
    发明授权
    Method and apparatus for providing an interactive language tutor 有权
    用于提供交互式语言导师的方法和装置

    公开(公告)号:US07299188B2

    公开(公告)日:2007-11-20

    申请号:US10361256

    申请日:2003-02-10

    IPC分类号: G10L11/00 G10L21/06

    CPC分类号: G06F17/289 G10L15/02

    摘要: A method and apparatus for generating a pronunciation score by receiving a user phrase intended to conform to a reference phrase and processing the user phrase in accordance with at least one of an articulation-scoring engine, a duration scoring engine and an intonation-scoring engine to derive thereby the pronunciation score. The scores provided by the various scoring engines are adapted to provide a visual and/or numerical feedback that provides information pertaining to correctness or incorrectness in one or more speech-features such as intonation, articulation, voicing, phoneme error and relative word duration. Such useful interactive feedback will allow a user to quickly identify the problem area and take remedial action in reciting “tutor” sentences or phrases.

    摘要翻译: 一种用于通过接收旨在符合参考短语的用户短语并根据关节计分引擎,持续时间评分引擎和语调评分引擎中的至少一个来生成发音分数的方法和装置, 从而得出发音得分。 各种评分引擎提供的分数适于提供视觉和/或数值反馈,其提供关于一个或多个语音特征(例如语调,发音,发声,音素错误和相对词长度)中的正确性或不正确性的信息。 这种有用的交互式反馈将允许用户快速识别问题区域,并采取补救措施来背诵“辅导”句子或短语。

    Bit based arithmetic coding using variable size key cipher
    8.
    发明授权
    Bit based arithmetic coding using variable size key cipher 有权
    基于位的算术编码使用可变大小密钥密码

    公开(公告)号:US07664267B2

    公开(公告)日:2010-02-16

    申请号:US11170900

    申请日:2005-06-30

    摘要: An encryption device and method and decryption device and method which implement a bit-based encryption scheme and hardware design. The encryption device includes a random number generator, receiving a main key, determining a working key using at least one random number and outputting a working key, a model, receiving the main key, the working key and plain text to be encoded and generating at least two frequency counts. The encryption device further includes an encoder, which outputs encoded text based on the working key, the plain text and the at least two frequency counts. The encryption device and method and decryption device and method process encrypted text that is based upon a stream structure with an unlimited key length and may be compressed by 50%. The encoded text is changeable with different environments even for the same plain text and the same key. Operations of the hardware design are based on arithmetic additions and shifts, and not multiplications and divisions. As a result, the hardware design is simple and applicable to cryptography and e-commerce.

    摘要翻译: 一种实现基于位的加密方案和硬件设计的加密装置和方法和解密装置和方法。 加密装置包括随机数发生器,接收主密钥,使用至少一个随机数确定工作密钥并输出工作密钥,模型,接收主密钥,工作密钥和待编码的明文以及生成在 至少两个频率计数。 加密装置还包括编码器,其基于工作密钥,纯文本和至少两个频率计数输出编码文本。 加密装置和方法以及解密装置和方法处理基于具有无限密钥长度的流结构的加密文本,并且可以压缩50%。 即使对于相同的纯文本和相同的密钥,编码文本也可以使用不同的环境进行更改。 硬件设计的操作基于算术加法和移位,而不是乘法和除法。 因此,硬件设计简单,适用于密码学和电子商务。

    Automatic assessment of phonological processes
    9.
    发明授权
    Automatic assessment of phonological processes 有权
    自动评估语音过程

    公开(公告)号:US07302389B2

    公开(公告)日:2007-11-27

    申请号:US10637235

    申请日:2003-08-08

    IPC分类号: G10L15/26

    CPC分类号: G09B19/06 G10L15/02

    摘要: A computer-based system generates alternative phonetic transcriptions for a target word or phrase corresponding to specific phonological processes that replace individual phonemes or clusters of two or more phonemes with replacement phonemes. The system compares a user's speech with a list of possible transcriptions that includes the base (i.e., correct) transcription of the test target as well as the different alternative transcriptions, to identify the transcription that best matches the user's. In a speech therapy application, the system identifies the phonological process(es), if any, associated with the user's speech and generates statistics over multiple test targets that can be used to diagnose the user's specific phonological disorders. The system can also be implemented in other contexts such as foreign language instruction and automated attendant applications to cover a wide variety and range of accents and/or phonological disorders.

    摘要翻译: 基于计算机的系统产生用于替换具有替换音素的两个或多个音素的单个音素或簇的特定语音过程的目标词或短语的替代语音转录。 该系统将用户的语音与包括测试目标的基础(即,正确)转录以及不同的替代转录的可能转录的列表进行比较,以识别与用户最匹配的转录。 在语音治疗应用中,系统识别与用户语音相关联的语音过程(如果有的话),并产生可用于诊断用户的特定语音障碍的多个测试目标的统计。 该系统还可以在诸如外语指令和自动应答之类的其他情况下实现,以覆盖广泛的各种各样的口音和/或语音障碍。

    Method and system to compensate for the effects of packet delays on speech quality in a Voice-over IP system
    10.
    发明授权
    Method and system to compensate for the effects of packet delays on speech quality in a Voice-over IP system 有权
    用于补偿分组延迟对语音IP系统中的语音质量的影响的方法和系统

    公开(公告)号:US07266127B2

    公开(公告)日:2007-09-04

    申请号:US10068023

    申请日:2002-02-08

    IPC分类号: H04L12/56

    摘要: The system includes a jitter buffer for receiving speech packets in a Voice over Internet Protocol (VoIP) system, a playback device for adjusting the playback speed of the received speed packets, and a jitter buffer manager for detecting out of sequence packets in the jitter buffer and for sending commands to the playback device to adjust playback speed based on the detection. The speech signal is played back at the nominal speed when there are no out of sequence packets. The playback speed is decreased when an out of sequence packet is detected, thereby tending to increase the jitter buffer length. When an out of sequence packet arrives, the playback speed is increased in order to restore jitter buffer length to its nominal length.

    摘要翻译: 该系统包括用于在因特网协议语音(VoIP)系统中接收语音分组的抖动缓冲器,用于调整接收到的速度分组的回放速度的回放装置,以及用于检测抖动缓冲器中的顺序分组的抖动缓冲器管理器 并且用于根据检测向播放装置发送命令以调整播放速度。 当没有不合格的数据包时,语音信号以标称速度播放。 当检测到异步分组时,播放速度降低,从而趋于增加抖动缓冲器长度。 当序列分组到达时,播放速度增加,以将抖动缓冲区长度恢复到其标称长度。