Generating a frequency warping function based on phoneme and context
    1.
    发明授权
    Generating a frequency warping function based on phoneme and context 有权
    基于音素和语境生成频率扭曲函数

    公开(公告)号:US08401861B2

    公开(公告)日:2013-03-19

    申请号:US11654447

    申请日:2007-01-17

    IPC分类号: G10L21/00 G10L13/06

    CPC分类号: G10L15/07 G10L2021/0135

    摘要: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

    摘要翻译: 一种用于产生频率扭曲函数的方法,包括准备源和目标说话者的训练语音; 对演讲者的训练语音进行框架对齐; 从扬声器的帧对齐训练语音中选择对准的帧; 从所选择的对齐的帧中提取相应的共振峰参数集合; 以及基于相应的共振峰参数集合生成频率扭曲函数。 选择对准的帧的步骤优选地在源扬声器和目标扬声器的语音中使用相同或相似的上下文在相同或相似的帧对准音素的中间选择一对对齐的帧。 产生频率扭曲函数的步骤优选地使用相应的共振峰参数集合中的各种相应的共振峰参数作为分段线性频率扭曲函数中的关键位置来产生频率扭曲函数。

    On demand TTS vocabulary for a telematics system
    2.
    发明授权
    On demand TTS vocabulary for a telematics system 有权
    远程信息处理系统的按需TTS词汇表

    公开(公告)号:US08311804B2

    公开(公告)日:2012-11-13

    申请号:US13279626

    申请日:2011-10-24

    IPC分类号: G06F17/20 G10L21/00 G01C21/30

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(GPS)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    Method and apparatus for producing natural sounding pitch contours in a speech synthesizer
    3.
    发明授权
    Method and apparatus for producing natural sounding pitch contours in a speech synthesizer 有权
    用于在语音合成器中产生自然声音俯仰轮廓的方法和装置

    公开(公告)号:US07280969B2

    公开(公告)日:2007-10-09

    申请号:US09732122

    申请日:2000-12-07

    IPC分类号: G10L13/06

    CPC分类号: G10L13/033 G10L13/0335

    摘要: A speech synthesis system is disclosed that utilizes a pitch contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete pitch values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by filtering the pitch values with an impulse response filter having a pole at the desired low frequency value. The present invention serves to add vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform.

    摘要翻译: 公开了一种语音合成系统,其利用音调轮廓导致更自然的语音。 本发明使用低频能量增强器来修改用于合成语音的预测音调b(t)。 如果需要,低频能量增强器内插离散音调值,并增加与低频值相关联的音高轮廓的能量的量,例如低于10赫兹的所有频率值。 与低频值相关联的音高轮廓的能量的量可以增加,例如通过将频带限制噪声(载波信号)添加到音调轮廓b(t),或者通过用脉冲对频率值进行滤波 响应滤波器具有所需低频值的极点。 本发明用于将颤音添加到原始音调轮廓b(t),从而提高合成波形的自然度。

    Method and apparatus for generating a frequency warping function and for frequency warping
    4.
    发明申请
    Method and apparatus for generating a frequency warping function and for frequency warping 有权
    用于产生频率翘曲功能和频率翘曲的方法和装置

    公开(公告)号:US20070185715A1

    公开(公告)日:2007-08-09

    申请号:US11654447

    申请日:2007-01-17

    IPC分类号: G10L15/04

    CPC分类号: G10L15/07 G10L2021/0135

    摘要: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

    摘要翻译: 一种用于产生频率扭曲函数的方法,包括准备源和目标说话者的训练语音; 对演讲者的训练语音进行框架对齐; 从扬声器的帧对齐训练语音中选择对准的帧; 从所选择的对齐的帧中提取相应的共振峰参数集合; 以及基于相应的共振峰参数集合生成频率扭曲函数。 选择对准的帧的步骤优选地在源扬声器和目标扬声器的语音中使用相同或相似的上下文在相同或相似的帧对准音素的中间选择一对对齐的帧。 产生频率扭曲函数的步骤优选地使用相应的共振峰参数集合中的各种相应的共振峰参数作为分段线性频率扭曲函数中的关键位置来产生频率扭曲函数。

    Speech coding apparatus and method for generating acoustic feature
vector component values by combining values of the same features for
multiple time intervals
    5.
    发明授权
    Speech coding apparatus and method for generating acoustic feature vector component values by combining values of the same features for multiple time intervals 失效
    用于通过组合多个时间间隔的相同特征的值来生成声学特征矢量分量值的语音编码装置和方法

    公开(公告)号:US5544277A

    公开(公告)日:1996-08-06

    申请号:US98682

    申请日:1993-07-28

    CPC分类号: G10L15/02 G10L15/20

    摘要: A speech coding apparatus and method measures the values of at least first and second different features of an utterance during each of a series of successive time intervals. For each time interval, a feature vector signal has a first component value equal to a first weighted combination of the values of only one feature of the utterance for at least two time intervals. The feature vector signal has a second component value equal to a second weighted combination, different from the first weighted combination, of the values of only one feature of the utterance for at least two time intervals. The resulting feature vector signals for a series of successive time intervals form a coded representation of the utterance. In one embodiment, a first weighted mixture signal has a value equal to a first weighted mixture of the values of the features of the utterance during a single time interval. A second weighted mixture signal has a value equal to a second weighted mixture, different from the first weighted mixture, of the values of the features of the utterance during a single time interval. The first component value of each feature vector signal is equal to a first weighted combination of the values of only the first weighted mixture signals for at least two time intervals, and the second component value of each feature vector signal is equal to a second weighted combination, different from the first weighted combination, of the values of only the second weighted mixture for at least two time intervals.

    摘要翻译: 语音编码装置和方法在一系列连续时间间隔的每一个期间测量话音的至少第一和第二不同特征的值。 对于每个时间间隔,特征向量信号具有等于至少两个时间间隔的仅一个特征的值的第一加权组合的第一分量值。 特征向量信号具有等于至少两个时间间隔的话语的一个特征的值的等于第一加权组合的第二加权组合的第二分量值。 所得到的一系列连续时间间隔的特征矢量信号形成话音的编码表示。 在一个实施例中,第一加权混合信号具有等于在单个时间间隔期间话音特征值的第一加权混合的值。 第二加权混合信号具有等于在单个时间间隔期间话音特征的值的与第一加权混合不同的第二加权混合的值。 每个特征向量信号的第一分量值等于至少两个时间间隔的仅第一加权混合信号的值的第一加权组合,并且每个特征向量信号的第二分量值等于第二加权组合 与第一加权组合不同的是仅至少两个时间间隔的第二加权混合值的值。

    On demand TTS vocabulary for a telematics system
    6.
    发明授权
    On demand TTS vocabulary for a telematics system 有权
    远程信息处理系统的按需TTS词汇表

    公开(公告)号:US08046213B2

    公开(公告)日:2011-10-25

    申请号:US10913004

    申请日:2004-08-06

    IPC分类号: G06F17/20 G10L21/00 G01C21/30

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    System and method for rescoring N-best hypotheses of an automatic speech recognition system
    7.
    发明授权
    System and method for rescoring N-best hypotheses of an automatic speech recognition system 失效
    自动语音识别系统的N最佳假设的系统和方法

    公开(公告)号:US07761296B1

    公开(公告)日:2010-07-20

    申请号:US09286099

    申请日:1999-04-02

    IPC分类号: G10L17/00 G10L15/00

    CPC分类号: G10L15/08 G10L13/02 G10L15/10

    摘要: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses. The distance of a synthesized hypothesis to the original speech signal is then computed as the sum over all phonemes in the hypothesis of the Euclidean distance between the means of the feature vectors of the frames aligning to that phoneme for the original and the synthesized signals. The text of the hypothesis which is closest under the above metric to the original waveform is chosen as the final system output.

    摘要翻译: 一种用于通过将原始语音波形与针对N个最佳假设的每个文本序列生成的合成语音波形进行比较,从自动语音识别系统中获取N个最佳假设的系统和方法。 从原始语音波形到每个合成波形计算距离,并选择与被确定为最接近原始波形的合成波形相关联的文本作为最终假设。 原始波形和每个合成波形与音素级上的相应文本序列对齐。 针对原始波形以及每个合成假设计算与每个音素对准的特征向量的平均值。 然后,将合成假设与原始语音信号的距离计算为在与原始音素对应的帧的对象的特征向量的装置与合成信号之间的欧氏距离的假设中的所有音素之和。 选择与上述度量下最接近原始波形的假设文本作为最终的系统输出。

    Systems and methods for text-to-speech synthesis using spoken example
    8.
    发明申请
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US20050071163A1

    公开(公告)日:2005-03-31

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。

    Hierarchical labeler in a speech recognition system
    9.
    发明授权
    Hierarchical labeler in a speech recognition system 失效
    语音识别系统中的分层标签器

    公开(公告)号:US6023673A

    公开(公告)日:2000-02-08

    申请号:US869061

    申请日:1997-06-04

    IPC分类号: G10L5/06 G10L9/00

    CPC分类号: G10L15/083

    摘要: A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.

    摘要翻译: 语音编码装置和方法使用原型集的层次来编码话语,同时消耗更少的计算资源。 在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值,以产生表示特征值的一系列特征向量信号。 计算原型矢量信号的多个级别子集,其中较高级子集中的每个原型矢量信号与较低级子集中的至少一个原型矢量信号相关联。 每个级别子集包含多个原型矢量信号,其中较低级子集包含比较高级子集更多的原型。 将第一特征向量信号的特征值的接近度与原型矢量信号的第一级子集中的原型矢量信号的参数值进行比较,以获得第一特征向量信号和每个原型矢量的原型匹配分数的排序列表 信号在第一级子集。 将第一特征向量信号的特征值的接近度与与第一级子集中的最高排序原型向量相关联的第二(较低)级子集中的每个原型矢量信号的参数值进行比较,以获得第二 排名榜的原型比赛得分。 将具有最佳原型匹配分数的第二等级列表中的原型矢量信号的识别值输出为第一特征向量信号的编码话音表示信号。

    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM
    10.
    发明申请
    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM 有权
    对电视系统的需求TTS VOCABULARY

    公开(公告)号:US20120095676A1

    公开(公告)日:2012-04-19

    申请号:US13279626

    申请日:2011-10-24

    IPC分类号: G01C21/36

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。