Speech synthesis using perceptual linear prediction parameters
    1.
    发明授权
    Speech synthesis using perceptual linear prediction parameters 失效
    使用感知线性预测参数的语音合成

    公开(公告)号:US5165008A

    公开(公告)日:1992-11-17

    申请号:US761190

    申请日:1991-09-18

    IPC分类号: G10L19/06

    CPC分类号: G10L19/06

    摘要: A method for synthesizing human speech using a linear mapping of a small set of coefficients that are speaker-independent. Preferably, the speaker-independent set of coefficients are cepstral coefficients developed during a training session using a perceptual linear predictive analysis. A linear predictive all-pole model is used to develop corresponding formants and bandwidths to which the cepstral coefficients are mapped by using a separate multiple regression model for each of the five formant frequencies and five formant bandwidths. The dual analysis produces both the cepstral coefficients of the PLP model for the different vowel-like sounds and their true formant frequencies and bandwidths. The separate multiple regression models developed by mapping the cepstral coefficients into the formant frequencies and formant bandwidths can then be applied to cepstral coefficients determined for subsequent speech to produce corresponding formants and bandwidths used to synthesize that speech. Since less data are required for synthesizing each speech segment than in conventional techniques, a reduction in the required storage space and/or transmission rate for the data required in the speech synthesis is achieved. In addition, the cepstral coefficients for each speech segment can be used with the regressive model for a different speaker, to produce synthesized speech corresponding to the different speaker.

    Signal coding and decoding based on spectral dynamics
    2.
    发明申请
    Signal coding and decoding based on spectral dynamics 有权
    基于光谱动力学的信号编码和解码

    公开(公告)号:US20080031365A1

    公开(公告)日:2008-02-07

    申请号:US11583537

    申请日:2006-10-18

    IPC分类号: H04B14/04

    CPC分类号: G10L19/12 G10L19/02

    摘要: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.

    摘要翻译: 在一种装置和方法中,通过频域线性预测(FDLP)方案处理和编码时变信号以得到全极模型。 估计该方案产生的剩余信号。 全极模型和残余信号的量化值被分组为适合于传输或存储的编码信号。 为了重构时变信号,对编码信号进行解码。 解码过程基本上与编码过程相反。

    Auditory model for parametrization of speech
    3.
    发明授权
    Auditory model for parametrization of speech 失效
    听觉模型参数化语音

    公开(公告)号:US5450522A

    公开(公告)日:1995-09-12

    申请号:US747181

    申请日:1991-08-19

    IPC分类号: G10L21/02 G10L9/00

    CPC分类号: G10L21/0208

    摘要: A method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR). The technique is based on the filtering of time trajectories of an auditory-like spectrum derived from the Perceptual Linear Predictive (PLP) method of speech parameter estimation.

    摘要翻译: 提供了一种方法和系统,用于减轻诸如电信信道的影响的语音卷积失真对自动语音识别器(ASR)的性能的有害影响。 该技术基于从语音参数估计的感知线性预测(PLP)方法导出的听觉样谱的时间轨迹的滤波。

    SPECTRAL NOISE SHAPING IN AUDIO CODING BASED ON SPECTRAL DYNAMICS IN FREQUENCY SUB-BANDS
    4.
    发明申请
    SPECTRAL NOISE SHAPING IN AUDIO CODING BASED ON SPECTRAL DYNAMICS IN FREQUENCY SUB-BANDS 有权
    基于频率子波段频谱动态的音频编码中的频谱噪声形状

    公开(公告)号:US20110270616A1

    公开(公告)日:2011-11-03

    申请号:US12197069

    申请日:2008-08-22

    IPC分类号: G10L21/00

    摘要: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.

    摘要翻译: 公开了一种音频编码系统中的频谱噪声整形技术。 执行输入音频信号的频率分解以获得紧密跟随人类听觉系统分解的临界频带的多个频率子带。 确定每个子带的音调。 如果子带是音调,则对子带应用时域线性预测(TDLP)处理,产生表示子带信号的全极模型的残差信号和线性预测编码(LPC)系数。 使用频域线性预测(FDLP)方法进一步处理残留信号。 FDLP参数和LPC系数被传送到解码器。 在解码器处,将逆FDLP处理应用于经编码的残差信号,随后进行逆TDLP处理,根据原始子带信号的功率谱密度对量化噪声进行整形。 非音调子带信号绕过TDLP过程。

    Method and system for generating an estimated clean speech signal from a
noisy speech signal
    5.
    发明授权
    Method and system for generating an estimated clean speech signal from a noisy speech signal 失效
    用于从噪声语音信号产生估计干净语音信号的方法和系统

    公开(公告)号:US5878389A

    公开(公告)日:1999-03-02

    申请号:US496068

    申请日:1995-06-28

    IPC分类号: G10L21/02 G10L3/02

    摘要: A method and system for generating an estimate of a clean speech signal extracts time trajections of short-term parameters from a noisy speech signal to obtain a plurality of frequency components each having a magnitude spectrum and a phase spectrum. The magnitude spectrum is then compressed, filtered and then decompressed to obtain a modified magnitude spectrum. The speech signal is then reconstructed using the original phase spectrum and the modified magnitude spectrum.

    摘要翻译: 用于产生干净语音信号的估计的方法和系统从噪声语音信号中提取短期参数的时间观测,以获得各自具有幅度谱和相位谱的多个频率分量。 然后将幅度谱压缩,滤波,然后解压缩,以获得修改的幅度谱。 然后使用原始相位谱和修改的幅度谱来重构语音信号。

    Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands
    7.
    发明授权
    Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands 有权
    基于频率子带中频谱动力学的音频编码中的频谱噪声整形

    公开(公告)号:US08428957B2

    公开(公告)日:2013-04-23

    申请号:US12197069

    申请日:2008-08-22

    IPC分类号: G10L19/00

    摘要: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.

    摘要翻译: 公开了一种音频编码系统中的频谱噪声整形技术。 执行输入音频信号的频率分解以获得紧密跟随人类听觉系统分解的临界频带的多个频率子带。 确定每个子带的音调。 如果子带是音调,则对子带应用时域线性预测(TDLP)处理,产生表示子带信号的全极模型的残差信号和线性预测编码(LPC)系数。 使用频域线性预测(FDLP)方法进一步处理残留信号。 FDLP参数和LPC系数被传输到解码器。 在解码器处,将逆FDLP处理应用于经编码的残差信号,随后进行逆TDLP处理,根据原始子带信号的功率谱密度对量化噪声进行整形。 非音调子带信号绕过TDLP过程。

    System and Method for Processing Speech to Identify Keywords or Other Information
    8.
    发明申请
    System and Method for Processing Speech to Identify Keywords or Other Information 有权
    用于处理语音以识别关键词或其他信息的系统和方法

    公开(公告)号:US20150371635A1

    公开(公告)日:2015-12-24

    申请号:US14840089

    申请日:2015-08-31

    IPC分类号: G10L15/22

    摘要: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

    摘要翻译: 提供了一种执行语音处理的系统和方法。 系统包括被配置为接收包括语音的信号的音频检测系统和其中存储有存储有数据库中与每个关键字相关联的过滤器的集合的关键字模型的数据库的存储器。 处理器被配置为从音频检测系统接收包括语音的信号,将包括语音的信号分解为稀疏语音脉冲集合,并访问关键字数据库,并将稀疏语音脉冲集合与滤波器组合进行卷积。 处理器还被配置为基于基于识别的关键词的卷积和控制操作电子系统的结果来识别包括语音的信号内的关键字。

    SYSTEM AND METHOD FOR EFFICIENT SIGNAL PROCESSING TO IDENTIFY
AND UNDERSTAND SPEECH
    9.
    发明申请
    SYSTEM AND METHOD FOR EFFICIENT SIGNAL PROCESSING TO IDENTIFY AND UNDERSTAND SPEECH 有权
    用于识别和理解语音的有效信号处理的系统和方法

    公开(公告)号:US20140379347A1

    公开(公告)日:2014-12-25

    申请号:US13926659

    申请日:2013-06-25

    IPC分类号: G10L15/02 G10L15/187

    摘要: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

    摘要翻译: 提供了一种执行语音处理的系统和方法。 系统包括被配置为接收包括语音的信号的音频检测系统和其中存储有存储有数据库中与每个关键字相关联的过滤器的集合的关键字模型的数据库的存储器。 处理器被配置为从音频检测系统接收包括语音的信号,将包括语音的信号分解为稀疏语音脉冲集合,并访问关键字数据库,并将稀疏语音脉冲集合与滤波器组合进行卷积。 处理器还被配置为基于基于识别的关键词的卷积和控制操作电子系统的结果来识别包括语音的信号内的关键字。

    Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals
    10.
    发明授权
    Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals 有权
    使用频域线性预测多项式进行语音识别的系统和方法,以从信号的频域表示形成时间和频谱包络

    公开(公告)号:US07672838B1

    公开(公告)日:2010-03-02

    申请号:US11000874

    申请日:2004-12-01

    IPC分类号: G10L19/06 G10L19/00 G10L15/00

    CPC分类号: G10L15/02 G10L25/12

    摘要: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.

    摘要翻译: 根据本发明,提供了计算机实现的方法和系统来表示和建模音频信号的时间结构。 响应于接收到信号,执行在接收信号的至少一部分上的时间 - 频域变换以产生频域表示。 时域频域变换将来自时域表示的信号转换为频域表示。 在频域表示上执行频域线性预测(FDLP)以估计频域表示的时间包络。 基于时间包络,生成一个或多个语音特征。