Systems and methods for text-to-speech synthesis using spoken example
    11.
    发明申请
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US20050071163A1

    公开(公告)日:2005-03-31

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。

    Hierarchical labeler in a speech recognition system
    12.
    发明授权
    Hierarchical labeler in a speech recognition system 失效
    语音识别系统中的分层标签器

    公开(公告)号:US6023673A

    公开(公告)日:2000-02-08

    申请号:US869061

    申请日:1997-06-04

    IPC分类号: G10L5/06 G10L9/00

    CPC分类号: G10L15/083

    摘要: A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.

    摘要翻译: 语音编码装置和方法使用原型集的层次来编码话语,同时消耗更少的计算资源。 在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值,以产生表示特征值的一系列特征向量信号。 计算原型矢量信号的多个级别子集,其中较高级子集中的每个原型矢量信号与较低级子集中的至少一个原型矢量信号相关联。 每个级别子集包含多个原型矢量信号,其中较低级子集包含比较高级子集更多的原型。 将第一特征向量信号的特征值的接近度与原型矢量信号的第一级子集中的原型矢量信号的参数值进行比较,以获得第一特征向量信号和每个原型矢量的原型匹配分数的排序列表 信号在第一级子集。 将第一特征向量信号的特征值的接近度与与第一级子集中的最高排序原型向量相关联的第二(较低)级子集中的每个原型矢量信号的参数值进行比较,以获得第二 排名榜的原型比赛得分。 将具有最佳原型匹配分数的第二等级列表中的原型矢量信号的识别值输出为第一特征向量信号的编码话音表示信号。

    Generating a frequency warping function based on phoneme and context
    13.
    发明授权
    Generating a frequency warping function based on phoneme and context 有权
    基于音素和语境生成频率扭曲函数

    公开(公告)号:US08401861B2

    公开(公告)日:2013-03-19

    申请号:US11654447

    申请日:2007-01-17

    IPC分类号: G10L21/00 G10L13/06

    CPC分类号: G10L15/07 G10L2021/0135

    摘要: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

    摘要翻译: 一种用于产生频率扭曲函数的方法,包括准备源和目标说话者的训练语音; 对演讲者的训练语音进行框架对齐; 从扬声器的帧对齐训练语音中选择对准的帧; 从所选择的对齐的帧中提取相应的共振峰参数集合; 以及基于相应的共振峰参数集合生成频率扭曲函数。 选择对准的帧的步骤优选地在源扬声器和目标扬声器的语音中使用相同或相似的上下文在相同或相似的帧对准音素的中间选择一对对齐的帧。 产生频率扭曲函数的步骤优选地使用相应的共振峰参数集合中的各种相应的共振峰参数作为分段线性频率扭曲函数中的关键位置来产生频率扭曲函数。

    On demand TTS vocabulary for a telematics system
    14.
    发明授权
    On demand TTS vocabulary for a telematics system 有权
    远程信息处理系统的按需TTS词汇表

    公开(公告)号:US08311804B2

    公开(公告)日:2012-11-13

    申请号:US13279626

    申请日:2011-10-24

    IPC分类号: G06F17/20 G10L21/00 G01C21/30

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(GPS)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    Method and apparatus for producing natural sounding pitch contours in a speech synthesizer
    15.
    发明授权
    Method and apparatus for producing natural sounding pitch contours in a speech synthesizer 有权
    用于在语音合成器中产生自然声音俯仰轮廓的方法和装置

    公开(公告)号:US07280969B2

    公开(公告)日:2007-10-09

    申请号:US09732122

    申请日:2000-12-07

    IPC分类号: G10L13/06

    CPC分类号: G10L13/033 G10L13/0335

    摘要: A speech synthesis system is disclosed that utilizes a pitch contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete pitch values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by filtering the pitch values with an impulse response filter having a pole at the desired low frequency value. The present invention serves to add vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform.

    摘要翻译: 公开了一种语音合成系统,其利用音调轮廓导致更自然的语音。 本发明使用低频能量增强器来修改用于合成语音的预测音调b(t)。 如果需要,低频能量增强器内插离散音调值,并增加与低频值相关联的音高轮廓的能量的量,例如低于10赫兹的所有频率值。 与低频值相关联的音高轮廓的能量的量可以增加,例如通过将频带限制噪声(载波信号)添加到音调轮廓b(t),或者通过用脉冲对频率值进行滤波 响应滤波器具有所需低频值的极点。 本发明用于将颤音添加到原始音调轮廓b(t),从而提高合成波形的自然度。

    Method and apparatus for translating natural-language speech using multiple output phrases
    16.
    发明授权
    Method and apparatus for translating natural-language speech using multiple output phrases 有权
    使用多个输出短语翻译自然语言语言的方法和装置

    公开(公告)号:US06859778B1

    公开(公告)日:2005-02-22

    申请号:US09526985

    申请日:2000-03-16

    摘要: A multi-lingual translation system that provides multiple output sentences for a given word or phrase. Each output sentence for a given word or phrase reflects, for example, a different emotional emphasis, dialect, accents, loudness or rates of speech. A given output sentence could be selected automatically, or manually as desired, to create a desired effect. For example, the same output sentence for a given word or phrase can be recorded three times, to selectively reflect excitement, sadness or fear. The multi-lingual translation system includes a phrase-spotting mechanism, a translation mechanism, a speech output mechanism and optionally, a language understanding mechanism or an event measuring mechanism or both. The phrase-spotting mechanism identifies a spoken phrase from a restricted domain of phrases. The language understanding mechanism, if present, maps the identified phrase onto a small set of formal phrases. The translation mechanism maps the formal phrase onto a well-formed phrase in one or more target languages. The speech output mechanism produces high-quality output speech. The speech output may be time synchronized to the spoken phrase using the output of the event measuring mechanism.

    摘要翻译: 多语言翻译系统,为给定的单词或短语提供多个输出句子。 给定单词或短语的每个输出句反映出例如不同的情感强调,方言,口音,响度或语速。 给定的输出句子可以自动选择,或根据需要手动选择,以创建所需的效果。 例如,给定单词或短语的相同输出句子可以被记录三次,以选择性地反映兴奋,悲伤或恐惧。 多语言翻译系统包括短语识别机制,翻译机制,语音输出机制以及可选地,语言理解机制或事件测量机制或两者。 短语识别机制从短语的受限域识别口语短语。 语言理解机制(如果存在)将识别的短语映射到一小组正式短语。 翻译机制将正式短语映射到一个或多个目标语言的格式正确的短语。 语音输出机制产生高质量的输出语音。 语音输出可以使用事件测量机构的输出与语音短语进行时间同步。

    Method and apparatus for generating a frequency warping function and for frequency warping
    17.
    发明申请
    Method and apparatus for generating a frequency warping function and for frequency warping 有权
    用于产生频率翘曲功能和频率翘曲的方法和装置

    公开(公告)号:US20070185715A1

    公开(公告)日:2007-08-09

    申请号:US11654447

    申请日:2007-01-17

    IPC分类号: G10L15/04

    CPC分类号: G10L15/07 G10L2021/0135

    摘要: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

    摘要翻译: 一种用于产生频率扭曲函数的方法,包括准备源和目标说话者的训练语音; 对演讲者的训练语音进行框架对齐; 从扬声器的帧对齐训练语音中选择对准的帧; 从所选择的对齐的帧中提取相应的共振峰参数集合; 以及基于相应的共振峰参数集合生成频率扭曲函数。 选择对准的帧的步骤优选地在源扬声器和目标扬声器的语音中使用相同或相似的上下文在相同或相似的帧对准音素的中间选择一对对齐的帧。 产生频率扭曲函数的步骤优选地使用相应的共振峰参数集合中的各种相应的共振峰参数作为分段线性频率扭曲函数中的关键位置来产生频率扭曲函数。

    Systems and methods for pitch smoothing for text-to-speech synthesis
    18.
    发明申请
    Systems and methods for pitch smoothing for text-to-speech synthesis 审中-公开
    用于文本到语音合成的音调平滑的系统和方法

    公开(公告)号:US20060259303A1

    公开(公告)日:2006-11-16

    申请号:US11128003

    申请日:2005-05-12

    申请人: Raimo Bakis

    发明人: Raimo Bakis

    IPC分类号: G10L13/06

    CPC分类号: G10L13/10

    摘要: TTS synthesis systems are provided which implement computationally fast and efficient pitch contour smoothing methods for determining smooth pitch contours for non-smooth pitch contours, which closely track the non-smooth pitch contours. For example, a TTS method includes generating a sequence of phonetic units representative of a target utterance, determining a pitch contour for the target utterance, the pitch contour comprising a plurality of linear pitch contour segments, wherein each linear pitch contour segment has start and end times at anchor points of the pitch contour, filtering the pitch contour to determine pitch values of a smooth pitch contour at the anchor points, and determining the smooth pitch contour between adjacent anchor points by linearly interpolating between the pitch values of the smooth pitch contour at the anchor points.

    摘要翻译: 提供了TTS合成系统,其实现计算上快速和有效的俯仰轮廓平滑方法,用于确定非平滑俯仰轮廓的平滑俯仰轮廓,其紧密跟踪非平滑俯仰轮廓。 例如,TTS方法包括产生表示目标话语的语音单元序列,确定目标语音的音调轮廓,该音调轮廓包括多个线性俯仰轮廓线段,其中每个线性俯仰轮廓线段具有开始和结束 时间在俯仰轮廓的锚点处,过滤俯仰轮廓以确定锚点处的平滑俯仰轮廓的俯仰值,以及通过在平滑俯仰轮廓的间距值之间线性内插来确定相邻锚点之间的平滑俯仰轮廓 锚点。

    Generating paralinguistic phenomena via markup
    19.
    发明申请
    Generating paralinguistic phenomena via markup 有权
    通过标记产生分析现象

    公开(公告)号:US20050273338A1

    公开(公告)日:2005-12-08

    申请号:US10861055

    申请日:2004-06-04

    IPC分类号: G10L13/06

    CPC分类号: G10L13/08

    摘要: Examples of paralinguistic events (e.g., breaths, coughs, sighs, etc.) are recorded. A text-to-speech (“TTS”) engine may insert the examples into a stream of synthetic speech using, for example, markup. The synthetic speech may include a combination of normal text and paralinguistic text.

    摘要翻译: 记录截肢事件(例如呼吸,咳嗽,叹息等)的例子。 文本到语音(“TTS”)引擎可以使用例如标记将示例插入到合成语音流中。 合成语音可以包括正常文本和paralinguistic文本的组合。

    Method and apparatus for providing multiple output channels in a microphone
    20.
    发明授权
    Method and apparatus for providing multiple output channels in a microphone 失效
    用于在麦克风中提供多个输出通道的方法和装置

    公开(公告)号:US06959095B2

    公开(公告)日:2005-10-25

    申请号:US09927690

    申请日:2001-08-10

    IPC分类号: H04R3/00

    CPC分类号: H04R3/00

    摘要: Methods and apparatus for providing multiple output channels in a microphone. More particularly, provision is made for an arrangement wherein a single microphone is adapted to produce one or more different audio outputs depending upon characteristics of a speaker or user of the microphone while facilitating a high degree of accuracy in the recognition of the user or speaker by the arrangement. The microphone is adapted to produce one or more different audio streams or outputs depending upon the speaker presently using the microphone. In effect, this can be readily implemented by a main user or speaker, such as an interviewer on a radio or TV talk show, or any speaker in a conference room, intending to control the audio output streams by suitably activating a button or switch.

    摘要翻译: 用于在麦克风中提供多个输出通道的方法和装置。 更具体地,提供了一种布置,其中单个麦克风适于根据麦克风的扬声器或用户的特性产生一个或多个不同的音频输出,同时便于用户或扬声器的识别中的高度准确度 安排。 麦克风适于根据目前使用麦克风的扬声器产生一个或多个不同的音频流或输出。 实际上,这可以容易地由诸如无线电或电视谈话节目上的访问者或会议室中的任何扬声器的主要用户或扬声器实现,旨在通过适当地激活按钮或开关来控制音频输出流。