Speaker-independent word recognizer
    21.
    发明授权
    Speaker-independent word recognizer 失效
    扬声器独立字识别器

    公开(公告)号:US4763278A

    公开(公告)日:1988-08-09

    申请号:US484820

    申请日:1983-04-13

    IPC分类号: G10L11/00 G10L15/00 G10L5/00

    CPC分类号: G10L25/00 G10L15/00

    摘要: Speaker-independent word recognition is performed, based on a small acoustically distinct vocabulary, with minimal hardware requirements. After a simple preconditioning filter, the zero crossing intervals of the input speech are measured and sorted by duration, to provide a rough measure of the frequency distribution within each input frame. The distribution of zero crossing intervals is transformed into a binary feature vector, which is compared with each reference template using a modified Hamming distance measure. A dynamic time warping algorithm is used to permit recognition of various speaker rate, and to economize on the reference template storage requirements. A mask vector for each reference template is used to ignore insignificant (or speaker-dependent) features of the words detected.

    摘要翻译: 基于小的声学不同的词汇表,执行与扬声器无关的词识别,具有最少的硬件要求。 在简单的预处理滤波器之后,输入语音的过零间隔被测量并按照持续时间进行排序,以提供每个输入帧内的频率分布的粗略测量。 零交叉间隔的分布被转换成二进制特征向量,其使用修改的汉明距离度量与每个参考模板进行比较。 动态时间扭曲算法用于允许识别各种扬声器频率,并节省参考模板存储要求。 每个参考模板的掩码向量用于忽略检测到的单词的无关紧要(或与扬声器有关)的特征。

    Voice messaging system with unified pitch and voice tracking
    23.
    发明授权
    Voice messaging system with unified pitch and voice tracking 失效
    具有统一音调和语音跟踪功能的语音留言系统

    公开(公告)号:US4696038A

    公开(公告)日:1987-09-22

    申请号:US484718

    申请日:1983-04-13

    IPC分类号: G10L11/06 G10L19/06 G10L5/00

    CPC分类号: G10L19/06 G10L25/93

    摘要: This voice messaging system provides an LPC analyzer in combination with a pitch extractor wherein LPC parameters and a residual signal organized in a sequence of speech data frames are provided by the LPC analyzer as an output representative of an analog speech signal. The pitch extractor is operably associated with the LPC analyzer and produces a plurality of pitch candidates for each of the speech data frames in the sequence thereof. Dynamic programming is performed on the plurality of pitch candidates for each speech data frame and also with respect to a voiced/unvoiced decision of the speech data for each frame by tracking both pitch and voicing from frame to frame to provide an optimal pitch value and also an optimal voicing decision. During dynamic programming, a cumulative penalty for a sequence of frame pitch/voicing decisions is accumulated by defining a transition error between each pitch candidate of a current speech data frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate of the current frame equal to the transition error between the pitch candidate of the current frame plus the cumulative error of an optimally identified pitch candidate in the preceding frame to locate the track providing optimal pitch and voicing decisions based upon the lowest cumulative penalty. An encoder then encodes the LPC parameters as generated by the LPC analyzer and the optimal pitch and voicing decisions for each speech data frame for subsequent use in providing an audible synthesized speech output substantially identical to the original speech input.

    摘要翻译: 该语音消息传送系统提供了LPC分析器与音调提取器的组合,其中LPC参数和以语音数据帧序列组织的残余信号由LPC分析器提供作为模拟语音信号的输出代表。 音调提取器可操作地与LPC分析器相关联,并且为其序列中的每个语音数据帧产生多个音调候选。 对于每个语音数据帧,对于每个语音数据帧的多个音调候选进行动态编程,并且还针对每帧的语音数据的有声/无声决定,通过跟踪帧间的音调和发音,以提供最佳音调值,并且还 最佳发声决定。 在动态编程期间,通过定义当前语音数据帧的每个音调候选和前一帧的每个音调候选之间的转换误差来累积帧间距/发音决定序列的累积损失,并且定义每个音调的累积误差 当前帧的候选者等于当前帧的音调候选之间的转换误差加上前一帧中最佳识别的音调候选的累积误差,以根据最低累积罚分定位提供最佳音调和发声决定的音轨。 然后,编码器对由LPC分析器生成的LPC参数进行编码,并且为每个语音数据帧提供最佳的音调和发音决定,以便随后用于提供与原始语音输入基本相同的可听合成语音输出。

    Apparatus and method for identifying a speech pattern
    24.
    发明授权
    Apparatus and method for identifying a speech pattern 失效
    用于识别语音图案的装置和方法

    公开(公告)号:US5222190A

    公开(公告)日:1993-06-22

    申请号:US713481

    申请日:1991-06-11

    CPC分类号: G10L25/87

    摘要: A method and apparatus are provided for identifying one or more boundaries of a speech pattern within an input utterance. One or more anchor patterns are defined, and an input utterance is received. An anchor section of the input utterance is identified as corresponding to at least one of the anchor patterns. A boundary of the speech pattern is defined based upon the anchor section. Also provided are a method and apparatus for identifying a speech pattern within an input utterance. One or more segment patterns are defined, and an input utterance is received. Portions of the input utterance which correspond to the segment patterns are identified. One or more of the segments of the input utterance are defined responsive to the identified portions.

    Efficient pruning algorithm for hidden markov model speech recognition
    25.
    发明授权
    Efficient pruning algorithm for hidden markov model speech recognition 失效
    隐马尔可夫模型语音识别的有效修剪算法

    公开(公告)号:US4977598A

    公开(公告)日:1990-12-11

    申请号:US337608

    申请日:1989-04-13

    IPC分类号: G06F3/16 G10L11/00 G10L15/14

    CPC分类号: G10L15/14

    摘要: An efficient pruning method reduces central processing unit (CPU) loading during real time speech recognition by instructing the CPU to compare a current state's previously calculated probability score against a predetermined threshold value and to discard hypothesis containing states with probability scores below such threshold. After determining that the current state should be kept, the CPU is directed to locate an available slot in the scoring buffer where information about the current state is then stored. The CPU locates an available slot by comparing the current time-index with the time-index associated with each scoring buffer slot. When they are equal, the slot is considered not available; when the current time-index is greater, the slot is considered available. After the information about the current state is stored, the CPU then sets the current state's backpointer to point at the start state of the current best path if the current states represents a completed model. Regardless of the current state's status, the CPU then associates the current time-index with the time-indices of all the slots along the best path to the current state. The CPU then proceeds to calculate the probability score of the next current state and the method repeats until all states have been completed.

    Speaker independent speech recognition method and system
    26.
    发明授权
    Speaker independent speech recognition method and system 失效
    演讲者独立的语音识别方法和系统

    公开(公告)号:US4908865A

    公开(公告)日:1990-03-13

    申请号:US290816

    申请日:1988-12-22

    IPC分类号: G10L15/00

    CPC分类号: G10L15/12

    摘要: Recognition of sound units is improved by comparing frame-pair feature vectors which helps compensate for context variations in the pronunciation of sound units. A plurality of reference frames are stored of reference feature vectors representing reference words. A linear predictive coder (10) generates a plurality of spectral feature vectors for each frame of the speech signals. A filter bank system (12) transforms the spectral feature vectors to filter bank representations. A principal feature vector transformer (14) transforms the filter bank representations to an identity matrix of transformed input feature vectors. A concatenate frame system (16) concatenates the input feature vectors of adjacent frames to form the feature vector of a frame-pair. A transformer (18) and a comparator (20) compute the likelihood that each input feature vector for a frame-pair was produced by each reference frame. This computation is performed individually and independently for each reference frame-pairs. A dynamic time warper (22) constructs an optimum time path through the input speech signals for each of the computed likelihoods. A high level decision logic (24) recognizes the input speech signals as one of the reference words in response to the computed likelihoods and the optimum time paths.

    摘要翻译: 通过比较有助于补偿声音单元发音的上下文变化的帧对特征向量来改善声音单元的识别。 存储表示参考词的参考特征向量的多个参考帧。 线性预测编码器(10)为每个语音信号帧生成多个频谱特征向量。 滤波器组系统(12)将频谱特征向量变换为滤波器组表示。 主要特征向量变换器(14)将滤波器组表示转换成变换的输入特征向量的单位矩阵。 级联帧系统(16)连接相邻帧的输入特征向量以形成帧对的特征向量。 变压器(18)和比较器(20)计算每对参考帧产生每一个帧对的输入特征向量的可能性。 对于每个参考帧对,单独且独立地执行该计算。 动态时间整形器(22)通过输入语音信号为每个计算出的可能性构建最佳时间路径。 高电平判定逻辑(24)响应于所计算的似然性和最佳时间路径将输入语音信号识别为参考词之一。

    Connected word recognition enrollment method
    27.
    发明授权
    Connected word recognition enrollment method 失效
    连接词识别注册方法

    公开(公告)号:US4783808A

    公开(公告)日:1988-11-08

    申请号:US856722

    申请日:1986-04-25

    CPC分类号: G10L15/063 G10L15/22

    摘要: A method for generating connected word templates begins with generating isolated word templates of selected words. The isolated word templates are used to extract a continuous word template from a segment of continuous speech containing the selectd words. Both the isolated word templates and the connected word templates can be used to generate speech to determine the quality of the generated templates through aural judgment.

    摘要翻译: 用于产生连接的单词模板的方法开始于生成所选单词的隔离单词模板。 孤立词模板用于从包含selectd单词的连续语音段中提取连续词模板。 孤立词模板和连接的单词模板都可以用于生成语音,以通过听觉判断来确定生成的模板的质量。

    Speech analysis/synthesis system with silence suppression
    28.
    发明授权
    Speech analysis/synthesis system with silence suppression 失效
    具有静音抑制的语音分析/综合系统

    公开(公告)号:US4696039A

    公开(公告)日:1987-09-22

    申请号:US541497

    申请日:1983-10-13

    IPC分类号: G10L11/02 G10L21/02 G10L5/00

    摘要: Silence suppression in speech synthesis systems is achieved by detecting and processing only segments of voice activity. A segment is classified as "speech" if the energy of the signal is greater than an adaptively adjusted threshold. The adaptively adjusted threshold is preferably defined as the maximum of scaled values of two separate envelope parameters, which both track the variation in energy over the sequence of frames of speech data. One contour is a slow-rising fast-falling value, which is updated only during unvoiced speech frames, and therefore track a lower envelope of the energy contour. This parameter in effect tracks an ambiant noise level. The other parameter is a fast-rising slow-falling parameter, which is updated only during voiced speech frames, and thus tracks an upper envelope of the energy contour. (This in effect tracks the average speech level.) A nonsilent energy tracker and a silent energy tracker adjust corresponding energy values representing the energy contours.

    摘要翻译: 语音合成系统中的静音抑制是通过检测和处理语音活动的部分来实现的。 如果信号的能量大于自适应调整的阈值,则将段分类为“语音”。 自适应调整的阈值优选地被定义为两个分离的包络参数的缩放值的最大值,其两者都跟踪语音数据帧序列上的能量变化。 一个轮廓是缓慢上升的快速下降值,其仅在无声语音帧期间更新,并且因此跟踪能量轮廓的较低包络。 该参数实际上跟踪了噪音级别。 另一个参数是快速上升的慢下降参数,其仅在有声语音帧期间更新,并且因此跟踪能量轮廓的上部包络。 (这实际上跟踪平均语音电平。)非安静能量跟踪器和静音能量跟踪器调整表示能量轮廓的相应能量值。

    LPC pole encoding using reduced spectral shaping polynomial
    29.
    发明授权
    LPC pole encoding using reduced spectral shaping polynomial 失效
    LPC极点编码采用缩减频谱整形多项式

    公开(公告)号:US4536886A

    公开(公告)日:1985-08-20

    申请号:US373959

    申请日:1982-05-03

    IPC分类号: H04B1/66 G10L19/04 G10L1/00

    CPC分类号: G10L19/04 G10L19/24

    摘要: Pole encoding of a linear predictive all-pole model of speech is accomplished by first finding poles up to the number required for good prediction (e.g., ten). These poles are extracted from the LPC predictor polynomial, using, e.g., a slightly modified Bairstow method. Those poles having a sufficiently narrow bandwidth (i.e., those sufficiently near the unit circle) are separately encoded, since these poles generally correspond to perceptually important formants. The remaining poles are lumped together to form a residual polynomial. The residual polynomial is then transformed to produce reflection coefficients, and all reflection coefficients above the first two are discarded. This provides an efficient spectral-shaping polynomial of a reduced degree. Thus, pole encoding is made possible using a reduced and adaptively varied bit rate.

    摘要翻译: 线性预测全极点语音模型的极点编码是通过首先找到高达预测所需数量(例如十)的极点来实现的。 这些极点从LPC预测多项式中提取出来,使用例如稍微修改的Bairstow方法。 具有足够窄带宽(即,足够靠近单位圆的那些)的极被单独编码,因为这些极通常对应于感知重要的共振峰。 剩下的极点集中在一起形成残余多项式。 然后将剩余多项式变换以产生反射系数,并且丢弃前两个以上的所有反射系数。 这提供了降低程度的有效的频谱整形多项式。 因此,使用减小且自适应地改变的比特率使得极编码成为可能。

    Data converter for a speech synthesizer
    30.
    发明授权
    Data converter for a speech synthesizer 失效
    用于语音合成器的数据转换器

    公开(公告)号:US4304965A

    公开(公告)日:1981-12-08

    申请号:US42737

    申请日:1979-05-29

    CPC分类号: G10L19/00

    摘要: Data converter for a speech synthesizer system wherein encoded formant parameters as stored in a memory are decoded and transformed or converted to reflection coefficients in real time by means of a circuit implementing a Taylor series type approximation. The reflection coefficients are then quantized and input to a speech synthesizer which utilizes quantized reflection coefficients to synthesize speech. The use of the coded formant frequency speech data which inherently contains more speech intelligence than reflection coefficient speech data enables a speech synthesizer system which utilizes quantized reflection coefficients to operate at a significantly lower bit rate than would otherwise be possible where reflection coefficients are employed as the speech data stored in the memory.

    摘要翻译: 用于语音合成器系统的数据转换器,其中存储在存储器中的编码共振峰参数通过实现泰勒级数近似的电路被实时解码和变换或转换成反射系数。 然后将反射系数量化并输入到语音合成器,该语音合成器利用量化的反射系数来合成语音。 使用固有地包含比反射系数语音数据更多的语音智能的编码共振峰频率语音数据使得能够使用量化反射系数的语音合成器系统以比其他反射系数被采用的 存储在存储器中的语音数据。