Speech processing and speech synthesis using a linear combination of bases at peak frequencies for spectral envelope information
    1.
    发明授权
    Speech processing and speech synthesis using a linear combination of bases at peak frequencies for spectral envelope information 有权
    在频谱包络信息的峰值频率处使用基线的线性组合的语音处理和语音合成

    公开(公告)号:US08321208B2

    公开(公告)日:2012-11-27

    申请号:US12327399

    申请日:2008-12-03

    IPC分类号: G10L13/06 G10L19/02

    CPC分类号: G10L13/06

    摘要: An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data by discrete Fourier transform. The spectral envelope information is represented by L points. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient for each of L points of the spectral envelope information by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.

    摘要翻译: 信息提取单元通过离散傅里叶变换从每个语音数据帧提取L维的频谱包络信息。 频谱包络信息由L点表示。 基准存储单元存储N个碱基(L> N> 1)。 每个基准在具有L维的谱域中具有作为峰值频率的最大值的频带不同。 对应于沿着频域的频率轴的频带外的频率的值为零。 两个峰值频率沿频率轴相邻的两个频带部分重叠。 参数计算单元通过改变系数,将频谱包络信息和每个基线的线性组合之间的失真与频谱包络信息中的每个L点的系数最小化,并且设置失真最小化的每个基准的系数 到频谱包络信息的频谱包络参数。

    Method and apparatus using fused formant parameters to generate synthesized speech
    2.
    发明授权
    Method and apparatus using fused formant parameters to generate synthesized speech 有权
    使用融合共振峰参数产生合成语音的方法和装置

    公开(公告)号:US08175881B2

    公开(公告)日:2012-05-08

    申请号:US12222725

    申请日:2008-08-14

    IPC分类号: G10L13/06

    CPC分类号: G10L13/07 G10L13/04

    摘要: A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment.

    摘要翻译: 对应于目标语音的音素序列被分成多个段。 从存储具有至少一个帧的语音单元的语音单元存储器中选择用于每个段的多个语音单元。 多个语音单元具有与目标语音一致或相似的韵律特征。 为多个语音单元的每个帧生成具有至少一个共振峰频率的共振峰参数。 从多个语音单元的每个帧的共振峰参数生成每帧的融合共振峰参数。 从每个帧的融合共振峰参数生成每个段的融合语音单元。 通过连接每个段的融合语音单元来生成合成语音。

    Speech synthesis method and apparatus
    3.
    发明申请
    Speech synthesis method and apparatus 有权
    语音合成方法和装置

    公开(公告)号:US20090048844A1

    公开(公告)日:2009-02-19

    申请号:US12222725

    申请日:2008-08-14

    IPC分类号: G10L13/06

    CPC分类号: G10L13/07 G10L13/04

    摘要: A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment.

    摘要翻译: 对应于目标语音的音素序列被分成多个段。 从存储具有至少一个帧的语音单元的语音单元存储器中选择用于每个段的多个语音单元。 多个语音单元具有与目标语音一致或相似的韵律特征。 为多个语音单元的每个帧生成具有至少一个共振峰频率的共振峰参数。 从多个语音单元的每个帧的共振峰参数生成每帧的融合共振峰参数。 从每个帧的融合共振峰参数生成每个段的融合语音单元。 通过连接每个段的融合语音单元来生成合成语音。

    APPARATUS AND METHOD OF PROCESSING SPEECH
    4.
    发明申请
    APPARATUS AND METHOD OF PROCESSING SPEECH 有权
    装置和处理方法

    公开(公告)号:US20070168189A1

    公开(公告)日:2007-07-19

    申请号:US11533122

    申请日:2006-09-19

    IPC分类号: G10L15/26

    CPC分类号: G10L13/033 G10L2021/0135

    摘要: A speech processing apparatus according to an embodiment of the invention includes a conversion-source-speaker speech-unit database; a voice-conversion-rule-learning-data generating means; and a voice-conversion-rule learning means, with which it makes voice conversion rules. The voice-conversion-rule-learning-data generating means includes a conversion-target-speaker speech-unit extracting means; an attribute-information generating means; a conversion-source-speaker speech-unit database; and a conversion-source-speaker speech-unit selection means. The conversion-source-speaker speech-unit selection means selects conversion-source-speaker speech units corresponding to conversion-target-speaker speech units based on the mismatch between the attribute information of the conversion-target-speaker speech units and that of the conversion-source-speaker speech units, whereby the voice conversion rules are made from the selected pair of the conversion-target-speaker speech units and the conversion-source-speaker speech units.

    摘要翻译: 根据本发明的实施例的语音处理装置包括转换源扬声器语音单元数据库; 语音转换规则学习数据产生装置; 以及语音转换规则学习装置,用于进行语音转换规则。 语音转换规则学习数据生成装置包括转换对象扬声器语音单元提取装置; 属性信息生成装置; 转换源扬声器语音单元数据库; 以及转换源扬声器语音单元选择装置。 转换源扬声器语音单元选择装置基于转换对象扬声器语音单元的属性信息与转换目标扬声器语音单元的属性信息之间的不匹配来选择与转换对象扬声器语音单元相对应的转换源扬声器语音单元 源音扬声器语音单元,由此从所选择的转换对象扬声器语音单元和转换源扬声器语音单元中进行语音转换规则。

    Speech synthesizer, speech synthesis method and computer program product
    5.
    发明授权
    Speech synthesizer, speech synthesis method and computer program product 有权
    语音合成器,语音合成方法和计算机程序产品

    公开(公告)号:US09058807B2

    公开(公告)日:2015-06-16

    申请号:US13051541

    申请日:2011-03-18

    IPC分类号: G10L13/00 G10L13/04 G10L25/18

    CPC分类号: G10L13/04 G10L25/18

    摘要: According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.

    摘要翻译: 根据一个实施例,第一存储单元存储通过将n个带通滤波器应用于噪声信号而获得的n个带噪声信号。 第二存储单元存储n个带脉冲信号。 参数输入单元输入基频,n频带噪声强度和频谱参数。 提取单元在移位期间针对每个节距标记提取n个带噪声信号。 幅度控制单元根据带噪声强度改变提取的频带噪声信号和频带脉冲信号的幅度。 一代单元通过相加n个带噪声信号和n个带脉冲信号来产生混合声源信号。 生成单元生成基于间距标记生成的混合声源信号。 声道滤波器单元通过使用频谱参数对所生成的混合声源信号应用声道滤波器来产生语音波形。

    Voice conversion apparatus and method and speech synthesis apparatus and method
    6.
    发明授权
    Voice conversion apparatus and method and speech synthesis apparatus and method 有权
    语音转换装置及方法及语音合成装置及方法

    公开(公告)号:US08438033B2

    公开(公告)日:2013-05-07

    申请号:US12505684

    申请日:2009-07-20

    IPC分类号: G10L13/06 G10L13/00 G10L21/00

    CPC分类号: G10L13/033 G10L2021/0135

    摘要: A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.

    摘要翻译: 语音转换装置在参数存储器中存储目标语音的目标语音频谱参数,在语音转换规则存储器中存储用于将源语音的语音质量转换为目标语音的语音质量的语音转换规则,从 输入源语音,输入源语音的源语音频谱参数通过使用语音转换规则将提取的源语音频谱参数转换为第一转换频谱参数,从参数中选择类似于第一转换谱参数的目标语音频谱参数 生成从选定的目标语音频谱参数表示的非周期分量谱参数,将包含在第一转换频谱参数中的周期分量频谱参数与非周期分量频谱参数进行混合,得到第二转换频谱参数,并从 第二个转换光谱第 仪表。

    VOICE CONVERSION APPARATUS AND METHOD AND SPEECH SYNTHESIS APPARATUS AND METHOD
    7.
    发明申请
    VOICE CONVERSION APPARATUS AND METHOD AND SPEECH SYNTHESIS APPARATUS AND METHOD 有权
    语音转换设备和方法与语音合成设备及方法

    公开(公告)号:US20100049522A1

    公开(公告)日:2010-02-25

    申请号:US12505684

    申请日:2009-07-20

    IPC分类号: G10L13/04 G10L21/00

    CPC分类号: G10L13/033 G10L2021/0135

    摘要: A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.

    摘要翻译: 语音转换装置在参数存储器中存储目标语音的目标语音频谱参数,在语音转换规则存储器中存储用于将源语音的语音质量转换为目标语音的语音质量的语音转换规则,从 输入源语音,输入源语音的源语音频谱参数通过使用语音转换规则将提取的源语音频谱参数转换为第一转换频谱参数,从参数中选择类似于第一转换谱参数的目标语音频谱参数 生成从选定的目标语音频谱参数表示的非周期分量谱参数,将包含在第一转换频谱参数中的周期分量频谱参数与非周期分量频谱参数进行混合,得到第二转换频谱参数,并从 第二个转换光谱第 仪表。

    Apparatus and method for voice conversion using attribute information
    8.
    发明授权
    Apparatus and method for voice conversion using attribute information 有权
    使用属性信息进行语音转换的装置和方法

    公开(公告)号:US07580839B2

    公开(公告)日:2009-08-25

    申请号:US11533122

    申请日:2006-09-19

    IPC分类号: G10L13/00

    CPC分类号: G10L13/033 G10L2021/0135

    摘要: A speech processing apparatus according to an embodiment of the invention includes a conversion-source-speaker speech-unit database; a voice-conversion-rule-learning-data generating means; and a voice-conversion-rule learning means, with which it makes voice conversion rules. The voice-conversion-rule-learning-data generating means includes a conversion-target-speaker speech-unit extracting means; an attribute-information generating means; a conversion-source-speaker speech-unit database; and a conversion-source-speaker speech-unit selection means. The conversion-source-speaker speech-unit selection means selects conversion-source-speaker speech units corresponding to conversion-target-speaker speech units based on the mismatch between the attribute information of the conversion-target-speaker speech units and that of the conversion-source-speaker speech units, whereby the voice conversion rules are made from the selected pair of the conversion-target-speaker speech units and the conversion-source-speaker speech units.

    摘要翻译: 根据本发明的实施例的语音处理装置包括转换源扬声器语音单元数据库; 语音转换规则学习数据产生装置; 以及语音转换规则学习装置,用于进行语音转换规则。 语音转换规则学习数据生成装置包括转换对象扬声器语音单元提取装置; 属性信息生成装置; 转换源扬声器语音单元数据库; 以及转换源扬声器语音单元选择装置。 转换源扬声器语音单元选择装置基于转换对象扬声器语音单元的属性信息与转换目标扬声器语音单元的属性信息之间的不匹配来选择与转换对象扬声器语音单元相对应的转换源扬声器语音单元 源音扬声器语音单元,由此从所选择的转换对象扬声器语音单元和转换源扬声器语音单元中进行语音转换规则。

    APPARATUS AND METHOD FOR SUPPORTING READING OF DOCUMENT, AND COMPUTER READABLE MEDIUM
    9.
    发明申请
    APPARATUS AND METHOD FOR SUPPORTING READING OF DOCUMENT, AND COMPUTER READABLE MEDIUM 有权
    用于支持文件读取的装置和方法以及计算机可读介质

    公开(公告)号:US20120239390A1

    公开(公告)日:2012-09-20

    申请号:US13232478

    申请日:2011-09-14

    IPC分类号: G10L19/00

    CPC分类号: G10L13/10 G10L13/08 G10L25/63

    摘要: According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read.

    摘要翻译: 根据一个实施例,用于支持文档读取的装置包括模型存储单元,文档获取单元,特征信息提取和话语风格估计单元。 模型存储单元被配置为存储已经训练了第一特征信息和话语风格之间的对应关系的模型。 从训练文档中的多个句子中提取第一特征信息。 文档获取单元被配置为获取要读取的文档。 特征信息提取单元被配置为从要读取的文档中的每个句子中提取第二特征信息。 发音风格估计单元被配置为将要读取的文档中的多个句子的第二特征信息与模型进行比较,并且估计待读取的文档的每个句子的发音风格。

    Speech synthesis system and method
    10.
    发明申请

    公开(公告)号:US20060224391A1

    公开(公告)日:2006-10-05

    申请号:US11233092

    申请日:2005-09-23

    IPC分类号: G10L13/06

    CPC分类号: G10L13/07

    摘要: A speech synthesis system in a preferred embodiment includes a speech unit storage section, a phonetic environment storage section, a phonetic sequence/prosodic information input section, a plural-speech-unit selection section, a fused-speech-unit sequence generation section, and a fused-speech-unit modification/concatenation section. By fusing a plurality of selected speech units in the fused speech unit sequence generation section, a fused speech unit is generated. In the fused speech unit sequence generation section, the average power information is calculated for a plurality of selected M speech units, N speech units are fused together, and the power information of the fused speech unit is so corrected as to be equalized with the average power information of the M speech units.