Neural network, a method of learning of a neural network and phoneme
recognition apparatus utilizing a neural network
    1.
    发明授权
    Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network 失效
    神经网络,神经网络的学习方法和利用神经网络的音素识别装置

    公开(公告)号:US06026358A

    公开(公告)日:2000-02-15

    申请号:US576585

    申请日:1995-12-21

    申请人: Hideto Tomabechi

    发明人: Hideto Tomabechi

    CPC分类号: G06N3/049 G10L15/16

    摘要: A neuron device network is provided with a speech input layer, a context layer, a hidden layer, a speech output layer and a hypothesis layer. A phoneme to be learned is spectral-analyzed by an FFT unit and a vector row at a time point t is input to a speech input layer. Also, a vector state of the hidden layer at a time t-1 is input to the context layer, the vector row at a time t+1 is input to the speech output layer as an instructor signal, and a code row for hypothesizing the phoneme, or the code row, is input to the hypothesis layer. The time series relation of the vector rows and the phoneme are hypothetically learned. Alternatively, a spectrum, a cepstrum or a speech vector row based on outputs from the hidden layer of an auto-associative neural network is input to the speech input layer, and the code row is output from the hypothesis layer, taking into account the time series relation. The speech is recognized when a CPU reads the stored output values of the hidden layer and the connection weights of the hidden layer and the hypothesis layer from a memory of the neuron device network and calculates output values of the respective neuron devices of the hypothesis layer based on the output values and the connection weights. The corresponding phoneme is determined by collating the output values of the respective neuron devices of the hypothesis layer with the code rows in an instructor signal table.

    摘要翻译: 神经元设备网络设置有语音输入层,上下文层,隐藏层,语音输出层和假设层。 要学习的音素被FFT单元频谱分析,并且在时间点t的矢量行被输入到语音输入层。 此外,时刻t-1的隐藏层的矢量状态被输入到上下文层,时刻t + 1的矢量行作为指示信号被输入到语音输出层,并且代码行用于假设 音素或代码行被输入到假设层。 假设学习矢量行和音素的时间序列关系。 或者,基于来自自动关联神经网络的隐藏层的输出的频谱,倒谱或语音向量行被输入到语音输入层,并且代码行从假设层输出,考虑时间 系列关系。 当CPU从神经元设备网络的存储器读取隐藏层的存储的输出值和隐藏层和假设层的连接权重时,识别语音,并基于假设层的各个神经元设备的输出值计算 关于输出值和连接权重。 通过将假设层的各个神经元装置的输出值与教师信号表中的代码行进行对照来确定相应的音素。

    Envelope-invariant analytical speech resynthesis using periodic signals
derived from reharmonized frame spectrum
    2.
    发明授权
    Envelope-invariant analytical speech resynthesis using periodic signals derived from reharmonized frame spectrum 失效
    包络不变的分析语音再合成,使用从重新调和的帧频导出的周期信号

    公开(公告)号:US5987413A

    公开(公告)日:1999-11-16

    申请号:US869368

    申请日:1997-06-05

    CPC分类号: G10L13/07 G10L21/04

    摘要: Method envelope-invariant for audio signal synthesis from elementary audio waveforms stored in a dictionary wherein:the waveforms are perfectly periodic, and stored as one of their period,synthesis is obtained by overlap-adding of the waveforms obtained from time-domain repetition of the periodic waveforms with a weighting window whose size is approximately two times the period of the signals to weight, and whose relative position inside of the period is fixed to any value identical for all the periods, each extracted from a reharmonized and thus periodic waveform, obtained by modifying, without changing the spectral envelope, the frequencies and amplitudes of harmonics in the spectrum of a frame of the original continuous speech waveform,whereby the time shift between two successive waveforms obtained by weighting the original signals is set to the imposed fundamental frequency of the signal to synthesize.

    摘要翻译: 用于存储在词典中的基本音频波形的音频信号合成的方法包络不变量,其中:波形是完全周期性的,并且作为其周期之一存储,通过从时域重复获得的波形的重叠相加获得合成 具有加权窗口的周期性波形,其加权窗口的大小是要加权的信号的周期的两倍,并且其周期内的相对位置被固定为所有周期的任何值,每个周期从获得的重新调谐的和因此的周期性波形中提取出 通过在不改变频谱包络的​​情况下修改原始连续语音波形的帧的频谱中的谐波的频率和幅度,由此将通过对原始信号进行加权而获得的两个连续波形之间的时移设置为施加的基频 信号合成。

    Speech synthesis system and method utilizing phenome information and
rhythm imformation
    3.
    发明授权
    Speech synthesis system and method utilizing phenome information and rhythm imformation 失效
    语音合成系统和方法利用特征信息和节奏信息

    公开(公告)号:US5715368A

    公开(公告)日:1998-02-03

    申请号:US495155

    申请日:1995-06-27

    CPC分类号: G10L13/10 G10L13/08

    摘要: To synthesize speech, which is clear and high in naturalness, in a Japanese-language speech synthesis system by improving not only phoneme information but also rhythm information. In the Japanese-language, the independent word speech and the adjunct word speech are remarkably different in speech characteristic. The difference in speech characteristics between them is clearly observed, particularly in rhythmical elements such as the intensity, speech, and pitch of speech. From this fact, there is provided a new rule synthesis method which uses as a speech synthesis unit an adjunct word chain unit comprising a chain of one or more adjunct words and which is capable of synthesizing speech whose naturalness is high. The portion other than the adjunct word portion, i.e., the independent word portion, is constituted in a CV/VC unit.

    摘要翻译: 通过不仅提高音素信息而且改善节奏信息,在日语语音合成系统中合成语音清晰自然的语音。 在日语中,独立词语和辅助词语言在语言特征上有显着差异。 明确地观察到它们之间的语言特征的差异,特别是在诸如强度,言语和言语间的节奏元素中。 从这个事实,提供了一种新的规则合成方法,其使用包括一个或多个附加词的链的附加字链单元作为语音合成单元,并且能够合成自然度高的语音。 除了附加字部分之外的部分,即独立字部分,以CV / VC单元构成。

    Speech synthesis with weighted parameters at phoneme boundaries
    4.
    发明授权
    Speech synthesis with weighted parameters at phoneme boundaries 失效
    在音素边界加权参数的语音合成

    公开(公告)号:US5659664A

    公开(公告)日:1997-08-19

    申请号:US468640

    申请日:1995-06-06

    申请人: Jaan Kaja

    发明人: Jaan Kaja

    IPC分类号: C10L9/02 G10L13/04 G10L5/04

    CPC分类号: G10L13/07 G10L13/04 G10L25/15

    摘要: The invention relates to a method and an arrangement for speech synthesis and provides an automatic mechanism for simulating human speech. The method provides a number of control parameters for controlling a speech synthesis device. The invention solves the problem of coarticulation by using an interpolation mechanism. The control parameters are stored in a matrix or a sequence list for each polyphone. The behaviour of the respective parameter with time is defined around each phoneme boundary and polyphones are joined by forming a weighted mean value of the curves which are defined by their two associated matrices/sequences list. The invention also provides an arrangement for carrying out the method.

    摘要翻译: 本发明涉及一种用于语音合成的方法和装置,并且提供了一种用于模拟人类语音的自动机制。 该方法提供用于控制语音合成设备的多个控制参数。 本发明通过使用插值机制解决了共聚焦问题。 控制参数存储在每个polyphone的矩阵或序列表中。 通过形成由它们的两个相关联的矩阵/序列表定义的曲线的加权平均值,在每个音素边界周围定义相应参数随时间的行为。 本发明还提供了一种用于执行该方法的装置。

    Recognition unit model training based on competing word and word string
models
    5.
    发明授权
    Recognition unit model training based on competing word and word string models 失效
    基于竞争词和字串模型的识别单元模型训练

    公开(公告)号:US5579436A

    公开(公告)日:1996-11-26

    申请号:US30895

    申请日:1993-03-15

    CPC分类号: G10L15/063 G10L15/144

    摘要: A system pattern-based speech recognition, e.g., a hidden Markov model (HMM) based speech recognizer using Viterbi scoring. The principle of minimum recognition error rate is applied by the present invention using discriminative training. Various issues related to the special structure of HMMs are presented. Parameter update expressions for HMMs are provided.

    摘要翻译: 基于系统模式的语音识别,例如使用维特比计分的基于隐马尔可夫模型(HMM)的语音识别器。 本发明使用区分性训练来应用最小识别错误率的原理。 介绍了与HMM特殊结构有关的各种问题。 提供HMM的参数更新表达式。

    Method and apparatus for generating modified speech from
pitch-synchronous segmented speech waveforms
    6.
    发明授权
    Method and apparatus for generating modified speech from pitch-synchronous segmented speech waveforms 失效
    从音调同步分段语音波形生成修改语音的方法和装置

    公开(公告)号:US5933808A

    公开(公告)日:1999-08-03

    申请号:US553161

    申请日:1995-11-07

    摘要: A system that synchronously segments a speech waveform using pitch period and a center of the pitch waveform. The pitch waveform center is determined by finding a local minimum of a centroid histogram waveform of the low-pass filtered speech waveform for one pitch period. The speech waveform can then be represented by one or more of such pitch waveforms or segments during speech compression, reconstruction or synthesis. The pitch waveform can be modified by frequency enhancement/filtering, waveform stretching/shrinking in speech synthesis or speech disguise. The utterance rate can also be controlled to speed up or slow down the speech.

    摘要翻译: 使用音调周期和音调波形的中心同步地分割语音波形的系统。 通过在一个音调周期中找到低通滤波语音波形的质心直方图波形的局部最小值来确定音调波形中心。 然后,在语音压缩,重建或合成期间,语音波形可以由这种音调波形或段中的一个或多个表示。 音调波形可以通过频率增强/滤波,语音合成或语音伪装中的波形伸缩/缩小进行修改。 也可以控制话语速度来加速或减慢演讲速度。

    Karaoke apparatus using frequency of actual singing voice to synthesize
harmony voice from stored voice information
    7.
    发明授权
    Karaoke apparatus using frequency of actual singing voice to synthesize harmony voice from stored voice information 失效
    使用实际歌声的频率的卡拉OK装置从存储的语音信息中合成和声

    公开(公告)号:US5857171A

    公开(公告)日:1999-01-05

    申请号:US607341

    申请日:1996-02-26

    摘要: A karaoke apparatus produces a karaoke accompaniment which accompanies a singing voice of an actual player, and concurrently creates a harmony voice originating from a virtual player. In the karaoke apparatus, a memory device stores voice information of the virtual singer. An input device collects the singing voice of the actual player. An analyzing device analyzes an audio frequency of the collected singing voice. A synthesizing device processes the stored voice information based on the analyzed audio frequency to synthesize the harmony voice having another audio frequency which is set in harmony with the analyzed audio frequency. An output device mixes the collected singing voice and the synthesized harmony voice with each other, and outputs the mixed singing and harmony voices along with the karaoke accompaniment. In one preferred embodiment, the memory device stores the voice information in the form of a sequence of phonetic elements that are successively sampled syllable by syllable from a singing voice of the virtual player.

    摘要翻译: 卡拉OK装置产生伴随实际玩家的歌声的卡拉OK伴奏,同时产生源于虚拟播放器的和声。 在卡拉OK装置中,存储装置存储虚拟歌手的声音信息。 输入设备收集实际播放器的歌声。 分析装置分析所收集的歌声的音频。 合成装置基于分析的音频处理所存储的语音信息,以合成具有与所分析的音频一致的另一音频的和声音。 输出设备将收集到的歌声和合成的和声相结合,并输出混合的歌声和和声与卡拉OK伴奏。 在一个优选实施例中,存储装置以语音元素序列的形式存储语音信息,该语音元素序列是通过音节从虚拟播放器的歌声连续取样的。

    System and method for determining pitch contours
    8.
    发明授权
    System and method for determining pitch contours 失效
    用于确定俯仰轮廓的系统和方法

    公开(公告)号:US5790978A

    公开(公告)日:1998-08-04

    申请号:US528576

    申请日:1995-09-15

    IPC分类号: G10L11/04 G10L13/08 G10L5/04

    CPC分类号: G10L13/08 G10L13/04

    摘要: A system and method are provided for automatically computing local pitch contours from textual input to produce pitch contours that closely mimic those found in natural speech. The methodology of the invention incorporates parameterized equations whose parameters can be estimated directly from natural speech recordings. That methodology incorporates a model based on the premise that pitch contours instantiating a particular pitch contour class can be described as distortions in the temporal and frequency domains of a single, underlying contour. After the nature of the pitch contour for different pitch contour classes has been established, a pitch contour can be predicted that closely models a natural speech contour for a synthetic speech utterance by adding the individual contours of the different intonational classes and adjusting the boundaries of these to match the boundaries of the adjacent intonation curves.

    摘要翻译: 提供了一种系统和方法,用于从文本输入自动计算局部俯仰轮廓,以产生与自然语音中发现的俯仰轮廓紧密相似的俯仰轮廓。 本发明的方法包括参数化方程,其参数可以直接从自然语音记录估计。 该方法结合了一个模型,其基础是将实例化特定音调轮廓类的音高轮廓描述为单个底层轮廓的时域和频域的失真。 在已经建立了不同俯仰轮廓类别的俯仰轮廓的性质之后,可以预测俯仰轮廓,通过添加不同语言类的各个轮廓并调整这些轮廓的边界来紧密地模拟合成语音发音的自然语音轮廓 以匹配相邻语调曲线的边界。

    Text-to-speech synthesis by concatenation using or modifying clustered
phoneme waveforms on basis of cluster parameter centroids
    9.
    发明授权
    Text-to-speech synthesis by concatenation using or modifying clustered phoneme waveforms on basis of cluster parameter centroids 失效
    通过串联使用或修改基于聚类参数质心的聚类音素波形的文本到语音合成

    公开(公告)号:US5740320A

    公开(公告)日:1998-04-14

    申请号:US852705

    申请日:1997-05-07

    申请人: Kenzo Itoh

    发明人: Kenzo Itoh

    CPC分类号: G10L13/07

    摘要: In a waveform compilation (waveform concatenation or synthesis-by-rule) type speech synthesis method and speech synthesizer, phoneme waveform segments in natural speech waveforms are clustered, and one of the phoneme waveform segments having a parameter nearest the centroid of LPC parameters of all the phoneme waveforms in each cluster is selected and stored as a representative phoneme waveform in a waveform information memory. When synthesizing a speech waveform, representative phoneme waveforms of the same phonemes, whose context is most similar to that of each phoneme of a phoneme string of the speech to be synthesized, are selectively read out of the waveform information memory and thus read-out representative phoneme waveforms are sequentially concatenated for output as a continuous synthesized speech waveform.

    摘要翻译: 在波形编辑(波形级联或合成规则)型语音合成方法和语音合成器中,自然语音波形中的音素波形段被聚类,并且一个音素波形段具有最接近所有LPC参数的质心的参数 每个簇中的音素波形被选择并作为代表性音素波形存储在波形信息存储器中。 当合成语音波形时,选择性地从波形信息存储器中读出相同音素的代表性音素波形,其上下文与要合成的语音的每个音素的音素最相似,从而读出代表 音素波形被顺序连接以输出为连续的合成语音波形。

    Vocabulary independent discriminative utterance verification for
non-keyword rejection in subword based speech recognition
    10.
    发明授权
    Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition 失效
    词义独立的歧视话语验证在非关键词拒绝基于词的语音识别

    公开(公告)号:US5675706A

    公开(公告)日:1997-10-07

    申请号:US414243

    申请日:1995-03-31

    摘要: A verification system to determine unknown input speech contains a recognized keyword or consists of speech or other sounds that do not contain any of the keywords. The verification system is designed to operate on the subword level, so that the verification process is advantageously vocabulary independent. Such a vocabulary-independent verifier is achieved by a two-stage verification process comprising subword level verification followed by string level verification. The subword level verification stage verifies each subword segment in the input speech as determined by an Hidden Markov Model recognizer to determine if that segment consists of the sound corresponding to the subword that the HMM recognizer assigned to that segment. The string level verification stage combines the results of the subword level verification to make the rejection decision for the whole keyword. Advantageously, the training of this two-stage verifier is independent of the specific vocabulary set implying that when the vocabulary set is update or changed the verifier need not be retrained and can still be reliably verifying the new set of keywords.

    摘要翻译: 用于确定未知输入语音的验证系统包含识别的关键字或由不包含任何关键字的语音或其他声音组成。 验证系统被设计为在子词级上操作,使得验证过程有利于词汇独立。 这种与词汇无关的验证器通过两级验证过程来实现,该验证过程包括字符级验证,随后是字符串级验证。 子词级验证阶段验证由隐马尔可夫模型识别器确定的输入语音中的每个子词段,以确定该段是否由HMM识别器分配给该段的子词对应的声音组成。 字符串级验证阶段结合了词级验证的结果,为整个关键字做出拒绝决定。 有利的是,这种两阶段验证者的训练是独立于具体的词汇集,这意味着当词汇集被更新或改变时,验证者不需要再培训,并且仍然可靠地验证新的关键词集合。