Systems and methods for expressive text-to-speech
    1.
    发明申请
    Systems and methods for expressive text-to-speech 审中-公开
    表达文字到言语的系统和方法

    公开(公告)号:US20050096909A1

    公开(公告)日:2005-05-05

    申请号:US10695979

    申请日:2003-10-29

    IPC分类号: G10L13/00 G10L13/02 G10L13/08

    摘要: Systems and methods are provided for expressive text-to-speech which include identifying text to convert to speech, selecting a speech style sheet from a set of available speech style sheets, the speech style sheet defining desired speech characteristics, marking the text to associate the text with the selected speech style sheet, and converting the text to speech having the desired speech characteristics by applying a low level markup associated with the speech style sheet.

    摘要翻译: 提供了用于表达性文本到语音的系统和方法,其包括识别要转换为语音的文本,从一组可用语音样式表中选择语音样式表,定义期望的语音特征的语音样式表,标记文本以关联 具有所选择的语音样式表的文本,并且通过应用与语音样式表相关联的低级标记将文本转换为具有期望语音特征的语音。

    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis
    2.
    发明申请
    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis 有权
    方法,装置和计算机程序提供用于并行文本到语音合成的多扬声器数据库

    公开(公告)号:US20060229876A1

    公开(公告)日:2006-10-12

    申请号:US11101223

    申请日:2005-04-07

    IPC分类号: G10L13/00

    CPC分类号: G10L13/07 G10L2021/0135

    摘要: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

    摘要翻译: 一种用于生成对应于文本的可听话语词的方法,装置和计算机程序产品。 该方法包括提供文本字,并且响应于文本字,处理从多个扬声器导出的预先记录的语音片段,以便基于至少一个成本函数选择性地将语音片段并置在一起,以形成用于生成 对应于文本字的声音语音字。 还提供了一种数据结构,用于包括从多个扬声器导出的多个语音段的级联文本到语音系统,其中每个语音段包括相关联的属性向量,每个语音段包括至少一个属性 标识从中导出语音段的扬声器的向量元素。

    Generating paralinguistic phenomena via markup
    3.
    发明申请
    Generating paralinguistic phenomena via markup 有权
    通过标记产生分析现象

    公开(公告)号:US20050273338A1

    公开(公告)日:2005-12-08

    申请号:US10861055

    申请日:2004-06-04

    IPC分类号: G10L13/06

    CPC分类号: G10L13/08

    摘要: Examples of paralinguistic events (e.g., breaths, coughs, sighs, etc.) are recorded. A text-to-speech (“TTS”) engine may insert the examples into a stream of synthetic speech using, for example, markup. The synthetic speech may include a combination of normal text and paralinguistic text.

    摘要翻译: 记录截肢事件(例如呼吸,咳嗽,叹息等)的例子。 文本到语音(“TTS”)引擎可以使用例如标记将示例插入到合成语音流中。 合成语音可以包括正常文本和paralinguistic文本的组合。

    System and method for improving text-to-speech software intelligibility through the detection of uncommon words and phrases
    4.
    发明申请
    System and method for improving text-to-speech software intelligibility through the detection of uncommon words and phrases 审中-公开
    通过检测不寻常的单词和短语来提高文本到语音软件的清晰度的系统和方法

    公开(公告)号:US20050234724A1

    公开(公告)日:2005-10-20

    申请号:US10825578

    申请日:2004-04-15

    IPC分类号: G10L13/08 G10L21/02

    CPC分类号: G10L13/10 G10L21/0264

    摘要: Disclosed is a system and method for improving the intelligibility of speech output by a speech synthesizer by determining if uncommon words exist in the text, and if it is determined that an uncommon word exists in the text, pausing the output of the synthesized speech of the uncommon word to offset the uncommon word from its surrounding speech.

    摘要翻译: 公开了一种用于通过确定文本中是否存在不常见的单词来提高语音合成器的语音输出的可懂度的系统和方法,并且如果确定文本中存在不常见的单词,则暂停该文本的合成语音的输出 不寻常的话来弥补周围言论中的不寻常的话。

    SPEECH RECOGNITION USING DISCRIMINANT FEATURES
    5.
    发明申请
    SPEECH RECOGNITION USING DISCRIMINANT FEATURES 审中-公开
    使用歧视特征的语音识别

    公开(公告)号:US20080059168A1

    公开(公告)日:2008-03-06

    申请号:US11931014

    申请日:2007-10-31

    申请人: Ellen Eide

    发明人: Ellen Eide

    IPC分类号: G10L15/00

    CPC分类号: G10L15/02

    摘要: Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.

    摘要翻译: 根据一组抽象语言区分来表示语音波形的方法和布置,以便导出用于语音识别器的一组歧视特征。 通过将独特特征表示与原始波形表示相结合,可以在自动语音识别任务上实现33%的字错误率的降低。

    On demand TTS vocabulary for a telematics system
    6.
    发明申请
    On demand TTS vocabulary for a telematics system 有权
    远程信息处理系统的按需TTS词汇表

    公开(公告)号:US20060031062A1

    公开(公告)日:2006-02-09

    申请号:US10913004

    申请日:2004-08-06

    IPC分类号: G10L19/00

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    Methods and apparatus for conveying synthetic speech style from a text-to-speech system
    7.
    发明申请
    Methods and apparatus for conveying synthetic speech style from a text-to-speech system 有权
    从文字到语音系统传达合成语音风格的方法和设备

    公开(公告)号:US20060229872A1

    公开(公告)日:2006-10-12

    申请号:US11092008

    申请日:2005-03-29

    申请人: Ellen Eide Wael Hamza

    发明人: Ellen Eide Wael Hamza

    IPC分类号: G10L13/08

    CPC分类号: G10L13/033

    摘要: A technique for producing speech output in a text-to-speech system is provided. A message is created for communication to a user in a natural language generator of the text-to-speech system. The message is annotated in the natural language generator with a synthetic speech output style. The message is conveyed to the user through a speech synthesis system in communication with the natural language generator, wherein the message is conveyed in accordance with the synthetic speech output style.

    摘要翻译: 提供了一种用于在文本到语音系统中产生语音输出的技术。 创建用于与文本到语音系统的自然语言生成器中的用户通信的消息。 消息在自然语言生成器中用合成语音输出样式注释。 通过与自然语言生成器通信的语音合成系统将消息传送给用户,其中根据合成语音输出方式传送消息。

    Systems and methods for text-to-speech synthesis using spoken example
    8.
    发明申请
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US20050071163A1

    公开(公告)日:2005-03-31

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。

    Method and apparatus for generating a frequency warping function and for frequency warping
    9.
    发明申请
    Method and apparatus for generating a frequency warping function and for frequency warping 有权
    用于产生频率翘曲功能和频率翘曲的方法和装置

    公开(公告)号:US20070185715A1

    公开(公告)日:2007-08-09

    申请号:US11654447

    申请日:2007-01-17

    IPC分类号: G10L15/04

    CPC分类号: G10L15/07 G10L2021/0135

    摘要: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

    摘要翻译: 一种用于产生频率扭曲函数的方法,包括准备源和目标说话者的训练语音; 对演讲者的训练语音进行框架对齐; 从扬声器的帧对齐训练语音中选择对准的帧; 从所选择的对齐的帧中提取相应的共振峰参数集合; 以及基于相应的共振峰参数集合生成频率扭曲函数。 选择对准的帧的步骤优选地在源扬声器和目标扬声器的语音中使用相同或相似的上下文在相同或相似的帧对准音素的中间选择一对对齐的帧。 产生频率扭曲函数的步骤优选地使用相应的共振峰参数集合中的各种相应的共振峰参数作为分段线性频率扭曲函数中的关键位置来产生频率扭曲函数。

    Method, apparatus and computer program product providing prosodic-categorical enhancement to phrase-spliced text-to-speech synthesis
    10.
    发明申请
    Method, apparatus and computer program product providing prosodic-categorical enhancement to phrase-spliced text-to-speech synthesis 审中-公开
    方法,装置和计算机程序产品提供对短语拼接的文本到语音合成的韵律分类增强

    公开(公告)号:US20070055526A1

    公开(公告)日:2007-03-08

    申请号:US11212432

    申请日:2005-08-25

    IPC分类号: G10L13/08

    CPC分类号: G10L13/10

    摘要: Disclosed is a method, a system and a computer program product for text-to-speech synthesis. The computer program product comprises a computer useable medium including a computer readable program, where the computer readable program when executed on the computer causes the computer to operate in accordance with a text-to-speech synthesis function by operations that include, responsive to at least one phrase represented as recorded human speech to be employed in synthesizing speech, labeling the phrase according to a symbolic categorization of prosodic phenomena; and constructing a data structure that includes word/prosody-categories and word/prosody-category sequences for the phrase, and that further includes information pertaining to a phone sequence associated with the constituent word or word sequence for the phrase.

    摘要翻译: 公开了一种用于文本到语音合成的方法,系统和计算机程序产品。 计算机程序产品包括包括计算机可读程序的计算机可用介质,其中计算机可读程序在计算机上执行时,使得计算机根据文本到语音合成功能通过操作进行操作,所述操作至少响应于 用作合成语音的录音人语言的一个短语,根据韵律现象的符号分类标注短语; 以及构建包括该短语的单词/韵律类别和单词/韵律类别序列的数据结构,并且还包括与该短语的组成单词或单词序列相关联的电话序列的信息。