Prosody-pattern generating apparatus, speech synthesizing apparatus, and computer program product and method thereof
    1.
    发明授权
    Prosody-pattern generating apparatus, speech synthesizing apparatus, and computer program product and method thereof 有权
    韵律图案生成装置,语音合成装置及其计算机程序产品及其方法

    公开(公告)号:US08046225B2

    公开(公告)日:2011-10-25

    申请号:US12068600

    申请日:2008-02-08

    IPC分类号: G10L13/08

    CPC分类号: G10L13/10

    摘要: Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation.

    摘要翻译: 归一化参数通过计算语料库的训练句的初始韵律模式和韵律模式的平均值和标准偏差在标准化参数生成单元处产生。 然后,根据归一化参数,在韵律模式归一化单元处对初始韵律模式的方差范围或方差宽度进行归一化。 结果,可以通过少量的计算产生与人的言语和自然性相似的韵律模式。

    SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT
    2.
    发明申请
    SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT 失效
    语音处理设备,方法和计算机程序产品

    公开(公告)号:US20090248417A1

    公开(公告)日:2009-10-01

    申请号:US12405587

    申请日:2009-03-17

    IPC分类号: G10L13/08 G10L13/06 G10L13/00

    CPC分类号: G10L13/0335 G10L13/10

    摘要: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level. Moreover, the parameterization of that main level has to be invertible so that the final pitch contour is obtained from the parameters of that level by means of an inverse transformation.

    摘要翻译: 提出了一种产生语音合成的音调轮廓的方法。 该方法基于找到音调轮廓,该音高轮廓使得在一个或多个语言水平上通过语音的音高轮廓段的所有统计模型的组合产生的总似然函数最大化。 这些统计模型通过一种决策树从口语语言数据库中训练出来,该决策树为每个语言级别聚集从口语语音数据提取的音调段的参数表示,并从与该语音数据相关联的文本获得的一些特征。 音调段的参数化以这样的方式执行,任何语言水平的似然函数可以根据其中一个级别的参数来表示,从而允许相对于该级别的参数来计算最大化。 此外,该主电平的参数化必须是可逆的,以便通过逆变换从该电平的参数获得最终音调轮廓。

    SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND COMPUTER PROGRAM PRODUCT
    3.
    发明申请
    SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND COMPUTER PROGRAM PRODUCT 审中-公开
    语音识别装置,语音识别方法和计算机程序产品

    公开(公告)号:US20080077404A1

    公开(公告)日:2008-03-27

    申请号:US11850980

    申请日:2007-09-06

    IPC分类号: G10L15/06

    摘要: A speech recognition device includes an extracting unit that analyzes an input signal and extracts a feature to be used for speech recognition from the input signal; a storing unit configured to store therein an acoustic model that is a stochastic model for estimating what type of a phoneme is included in the feature; a speech-recognition unit that performs speech recognition on the input signal based on the feature and determines a word having maximum likelihood from the acoustic model; and an optimizing unit that dynamically self-optimizes parameters of the feature and the acoustic model depending on at least one of the input signal and a state of the speech recognition performed by the speech-recognition unit.

    摘要翻译: 语音识别装置包括提取单元,其从输入信号分析输入信号并提取要用于语音识别的特征; 存储单元,被配置为在其中存储声学模型,所述声学模型是用于估计所述特征中包括哪种类型的音素的随机模型; 语音识别单元,其基于所述特征对所述输入信号执行语音识别,并且从所述声学模型确定具有最大似然度的单词; 以及优化单元,其根据所述语音识别单元执行的所述输入信号和所述语音识别的状态中的至少一个动态自动优化所述特征和所述声学模型的参数。

    Variable bit rate coding system
    4.
    发明授权
    Variable bit rate coding system 失效
    可变位速率编码系统

    公开(公告)号:US5214741A

    公开(公告)日:1993-05-25

    申请号:US625061

    申请日:1990-12-10

    CPC分类号: H04B1/667

    摘要: A packet communication system or ATM communication system in which a sequence of signals such as speech signals is divided into a plurality of band areas and the power of each band area is determined. Based on the power of each band area, coding signals are allocated for each band, frame by frame. At a receiving side, the signal to noise ratio SNR of the decoded signal is predicted by changing the total number of encoding bits for each band area based on the power of each band area signal. The bit rate is controlled so as to make the SNR constant. The bit rate is changed in accordance with a Fourier transform of the input signal.

    Variable rate encoding and communicating apparatus
    5.
    发明授权
    Variable rate encoding and communicating apparatus 失效
    可变速率编码和通信装置

    公开(公告)号:US5150387A

    公开(公告)日:1992-09-22

    申请号:US630911

    申请日:1990-12-20

    IPC分类号: H04B1/66

    CPC分类号: H04B1/667 G10L19/24

    摘要: In a transmitter in the present invention, an input signal is input to a QMF bank 102 where the input signal is divided to a plurality of frequency bands to form corresponding band signals. A distributed bit calculating unit 109 calculates respective bit rates with which the corresponding band signals are encoded on the respective power values of the band signals. Quantizers 104-1, 104-2, . . . , 104-n encode the respective band signals at the corresponding bit rates and input the resulting corresponding band codes to a multiplexer unit 111 which incorporates the respective band codes into a cell as an information unit and sends the cell. In a receiver, a cell is decomposed to obtain the respective band codes, which are then dequantized to form the corresponding band signals. These band signals are synthesized to form a signal for the entire band, and the signal for the entire band is output as a decoded signal.

    摘要翻译: 在本发明的发射机中,将输入信号输入到QMF组102,其中输入信号被划分成多个频带以形成相应的频带信号。 分布位计算单元109计算相应频带信号对频带信号的各个功率值进行编码的各个比特率。 量子化器104-1,104-2, 。 。 ,104-n以对应的比特率对各个频带信号进行编码,并将所得到的相应频带码输入到将各频带码合并到一个小区中作为信息单元的多路复用单元111,并发送该小区。 在接收机中,单元被分解以获得相应的频带码,然后将它们去量化以形成相应的频带信号。 这些频带信号被合成以形成整个频带的信号,并且将整个频带的信号作为解码信号输出。

    APPARATUS FOR CREATING SPEAKER MODEL, AND COMPUTER PROGRAM PRODUCT
    6.
    发明申请
    APPARATUS FOR CREATING SPEAKER MODEL, AND COMPUTER PROGRAM PRODUCT 有权
    创建演讲者模型的设备和计算机程序产品

    公开(公告)号:US20090094022A1

    公开(公告)日:2009-04-09

    申请号:US12244245

    申请日:2008-10-02

    IPC分类号: G10L19/02

    CPC分类号: G10L15/065 G10L15/20

    摘要: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.

    摘要翻译: 变换参数计算单元计算指示说话者模型的参数的第一模型参数,以使干净特征的第一可能性最大化,并且计算用于使第一似然性变为最大的变换参数。 变换参数对于每个扬声器,将与扬声器的识别信息相对应的干净特征的分布变换为由第一模型参数的说话者模型表示的分布。 模型参数计算单元通过使用变换参数来变换与每个扬声器的识别信息相对应的噪声特征,并且计算指示说话者模型的参数的第二模型参数,以使转换后的噪声特征的第二可能性达到最大。

    Speech synthesis method
    7.
    发明授权
    Speech synthesis method 失效
    语音合成方法

    公开(公告)号:US06760703B2

    公开(公告)日:2004-07-06

    申请号:US10265458

    申请日:2002-10-07

    IPC分类号: G10L1302

    CPC分类号: G10L13/07 G10L25/90

    摘要: A speech synthesis method that generates a speech pitch wave from a reference speech signal by subjecting the reference speech signal to one of Fourier transform and Fourier series expansion to produce a discrete spectrum, that interpolates the discrete spectrum to generate a consecutive spectrum, and that subjects the consecutive spectrum to inverse Fourier transform. A linear prediction coefficient is generated by subjecting the reference speech signal to a linear prediction analysis. The speech pitch wave is subjected to inverse-filtering based on the linear prediction coefficient to produce a residual pitch wave. Information regarding the residual pitch wave is stored as information of a speech synthesis unit in a voice period. A speech is then synthesized using the information of the speech synthesis unit.

    摘要翻译: 一种语音合成方法,其通过对所述参考语音信号进行傅立叶变换和傅立叶级数展开之一来产生离散频谱,从而从所述参考语音信号生成语音基音波,其中内插离散频谱以产生连续频谱,并且所述对象 连续谱到傅里叶逆变换。 通过对参考语音信号进行线性预测分析来生成线性预测系数。 基于线性预测系数对语音音调波进行逆滤波以产生残余音调波。 关于残余音调波的信息作为语音合成单元的信息存储在语音周期中。 然后使用语音合成单元的信息来合成语音。

    Speech synthesis method
    8.
    发明授权
    Speech synthesis method 失效
    语音合成方法

    公开(公告)号:US06553343B1

    公开(公告)日:2003-04-22

    申请号:US09984254

    申请日:2001-10-29

    IPC分类号: G01L1302

    CPC分类号: G10L13/07 G10L25/90

    摘要: A speech synthesis method subjects a reference speech signal to windowing to extract an aperiodic speech pitch wave from the reference speech signal. A linear prediction coefficient is generated by subjecting the reference speech signal to a linear prediction analysis. The aperiodic speech pitch wave is subjected to inverse-filtering based on the linear prediction coefficient to produce a residual pitch wave. Information regarding the residual pitch wave is stored as information of a speech synthesis unit and a voiced period in the storage. The speech is then synthesized using the information of the speech synthesis unit.

    摘要翻译: 语音合成方法使参考语音信号进行窗口化以从参考语音信号中提取非周期性语音基音波。 通过对参考语音信号进行线性预测分析来生成线性预测系数。 对非周期性语音音调波进行基于线性预测系数的反相滤波以产生残余音调波。 关于残余音调波的信息作为语音合成单元的信息和有声周期存储在存储器中。 然后使用语音合成单元的信息来合成语音。

    Speech encoding and decoding with pitch filter range unrestricted by
codebook range and preselecting, then increasing, search candidates
from linear overlap codebooks
    9.
    发明授权
    Speech encoding and decoding with pitch filter range unrestricted by codebook range and preselecting, then increasing, search candidates from linear overlap codebooks 失效
    语音编码和解码与音调滤波器范围不受码本范围和预选择限制,然后增加从线性重叠码本搜索候选

    公开(公告)号:US5819213A

    公开(公告)日:1998-10-06

    申请号:US791741

    申请日:1997-01-30

    IPC分类号: G10L19/08 G10L9/14

    CPC分类号: G10L19/08

    摘要: A speech encoding method and apparatus including analyzing, using a codebook expressing speech parameters within a predetermined search range, an input speech signal in an audibility weighting filter corresponding to a pitch period longer than the search range of the codebook, and searching, from the codebook, on the basis of the analysis result, a combination of speech parameters by which the distortion of the input speech signal is minimized, and encoding the combination. The apparatus uses an adaptive codebook of pitch and a noise codebook. The codebooks search a group formed by extracting vectors of predetermined length from one original code vector, while sequentially shifting position so that the vectors overlap each other. The search group is further restricted and another preselection is made before the final search. Search is based on inversely convoluted, orthogonally transformed vectors.

    摘要翻译: 一种语音编码方法和装置,包括使用在预定搜索范围内表达语音参数的码本来分析与音码周期相对应的音频周期的声音加权滤波器中的输入语音信号,并且从码本中搜索 基于分析结果,将输入语音信号的失真最小化的语音参数的组合以及编码该组合。 该装置使用音调和噪声码本的自适应码本。 码本搜索通过从一个原始码矢量提取预定长度的矢量而形成的组,同时依次移位位置使得矢量彼此重叠。 搜索组进一步限制,并在最终搜索之前进行另一预选。 搜索基于逆卷积正交变换载体。

    Speech communication apparatus equipped with echo canceller
    10.
    发明授权
    Speech communication apparatus equipped with echo canceller 失效
    配有回声消除器的语音通信装置

    公开(公告)号:US5400399A

    公开(公告)日:1995-03-21

    申请号:US962589

    申请日:1993-02-26

    IPC分类号: H04M9/08 H04M1/00

    CPC分类号: H04M9/082

    摘要: A speech communication apparatus of the present invention includes, in addition to an echo canceller for canceling an acoustic echo generated in a hands-free speech space, a chirp signal generating unit and a training unit. The chirp signal generating unit generates a chirp signal adequate for initial training of the echo canceller. The training control unit enables the chirp signal generating unit to generate a chirp signal, when a predetermined condition for starting hands-free speaking is satisfied, and a chirp tone corresponding to the chirp signal to be output as a volume-amplified tone from the hands-free speaker. The echo canceller performs initial training of the echo canceller based on the chirp tone.

    摘要翻译: PCT No.PCT / JP92 / 00564 Sec。 371日期:1993年2月26日 102(e)日期1993年2月26日PCT提交1992年4月30日PCT公布。 出版物WO92 / 20170 本发明的语音通信装置除了用于消除在免提语音空间中产生的声学回声的回波消除器之外,还包括线性调频信号产生单元和训练单元。 线性调频信号产生单元产生适于回波消除器的初始训练的线性调频信号。 当满足用于开始免提通话的预定条件时,训练控制单元使得线性调频信号产生单元产生啁啾信号,并且与来自手的音量放大的音调相对应的啁啾信号对应的啁啾音 免费扬声器 回波消除器基于啁啾音执行回声消除器的初始训练。