System and method for dynamically selecting among TTS systems
    1.
    发明授权
    System and method for dynamically selecting among TTS systems 有权
    在TTS系统之间进行动态选择的系统和方法

    公开(公告)号:US07702510B2

    公开(公告)日:2010-04-20

    申请号:US11622683

    申请日:2007-01-12

    IPC分类号: G10L13/08 G10L13/00

    CPC分类号: G10L13/047

    摘要: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

    摘要翻译: 在文本到语音(TTS)系统中动态选择的系统和方法。 系统和方法的示例性实施例包括识别用于转换成语音波形的文本,通过三个TTS系统合成所述文本,从三个系统中的每一个生成候选波形,从三个系统中的每个系统生成得分, 三个分数,基于标准选择分数,并且基于所选择的三个分数选择三个波形中的一个。

    SYSTEM AND METHOD FOR DYNAMICALLY SELECTING AMONG TTS SYSTEMS
    2.
    发明申请
    SYSTEM AND METHOD FOR DYNAMICALLY SELECTING AMONG TTS SYSTEMS 有权
    用于动态选择TTS系统的系统和方法

    公开(公告)号:US20080172234A1

    公开(公告)日:2008-07-17

    申请号:US11622683

    申请日:2007-01-12

    IPC分类号: G10L13/02

    CPC分类号: G10L13/047

    摘要: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

    摘要翻译: 在文本到语音(TTS)系统中动态选择的系统和方法。 系统和方法的示例性实施例包括识别用于转换成语音波形的文本,通过三个TTS系统合成所述文本,从三个系统中的每一个生成候选波形,从三个系统中的每个系统生成得分, 三个分数,基于标准选择分数,并且基于所选择的三个分数选择三个波形中的一个。

    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis
    3.
    发明授权
    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis 有权
    方法,装置和计算机程序提供用于并行文本到语音合成的多扬声器数据库

    公开(公告)号:US07716052B2

    公开(公告)日:2010-05-11

    申请号:US11101223

    申请日:2005-04-07

    IPC分类号: G10L13/00 G10L13/08 G10L13/06

    CPC分类号: G10L13/07 G10L2021/0135

    摘要: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

    摘要翻译: 一种用于生成对应于文本的可听话语词的方法,装置和计算机程序产品。 该方法包括提供文本字,并且响应于文本字,处理从多个扬声器导出的预先记录的语音片段,以便基于至少一个成本函数选择性地将语音片段并置在一起,以形成用于生成 对应于文本字的声音语音字。 还提供了一种数据结构,用于包括从多个扬声器导出的多个语音段的级联文本到语音系统,其中每个语音段包括相关联的属性向量,每个语音段包括至少一个属性 标识从中导出语音段的扬声器的向量元素。

    Systems and methods for text-to-speech synthesis using spoken example
    4.
    发明授权
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US08886538B2

    公开(公告)日:2014-11-11

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/08 G10L13/10

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。

    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM
    5.
    发明申请
    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM 审中-公开
    方法和计算机程序产品,用于在文本到语音系统中提供分隔符

    公开(公告)号:US20080167876A1

    公开(公告)日:2008-07-10

    申请号:US11619682

    申请日:2007-01-04

    IPC分类号: G10L21/06

    摘要: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.

    摘要翻译: 提供了一种用于在文本到语音(TTS)系统中提供释义的方法和计算机程序产品。 该方法包括接收输入文本,解析输入文本以及确定输入文本的释义。 该方法还包括将释义合成为合成语音。 该方法还包括选择合成语音以输出,其包括:将分数分配给与每个释义相关联的每个合成语音,比较与每个释义相关联的每个合成语音的得分,以及选择最高得分合成语音以输出。 此外,该方法包括输出所选择的合成语音。

    Application of emotion-based intonation and prosody to speech in text-to-speech systems
    6.
    发明授权
    Application of emotion-based intonation and prosody to speech in text-to-speech systems 有权
    基于情感的语调和韵律在文字到语音系统中的应用

    公开(公告)号:US07401020B2

    公开(公告)日:2008-07-15

    申请号:US10306950

    申请日:2002-11-29

    申请人: Ellen M. Eide

    发明人: Ellen M. Eide

    IPC分类号: G10L19/00

    CPC分类号: G10L13/10 Y10S715/977

    摘要: A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

    摘要翻译: 包括用于接受文本输入的装置的文本到语音系统,用于提供合成语音输出的装置,以及用于将基于情绪的特征传递给合成语音输出的装置。 用于传递基于情感的特征的布置包括用于接受用于将至少一种基于情绪的范例传递给合成语音输出的指令的布置,以及用于将至少一种基于情感的范例应用于合成语音输出的布置。

    Training of text-to-speech systems

    公开(公告)号:US06535852B2

    公开(公告)日:2003-03-18

    申请号:US09821399

    申请日:2001-03-29

    申请人: Ellen M. Eide

    发明人: Ellen M. Eide

    IPC分类号: G10L1308

    CPC分类号: G10L13/02 G10L13/04

    摘要: Building a data-driven text-to-speech system involves collecting a database of natural speech from which to train models or select segments for concatenation. Typically the speech in that database is produced by a single speaker. In this invention we include in our database speech from a multiplicity of speakers.

    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM
    8.
    发明申请
    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM 有权
    对电视系统的需求TTS VOCABULARY

    公开(公告)号:US20120095676A1

    公开(公告)日:2012-04-19

    申请号:US13279626

    申请日:2011-10-24

    IPC分类号: G01C21/36

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    Apparatus and method for speaker normalization based on biometrics
    9.
    发明授权
    Apparatus and method for speaker normalization based on biometrics 有权
    基于生物特征的扬声器归一化装置和方法

    公开(公告)号:US06823305B2

    公开(公告)日:2004-11-23

    申请号:US09745115

    申请日:2000-12-21

    申请人: Ellen M. Eide

    发明人: Ellen M. Eide

    IPC分类号: G10L1506

    CPC分类号: G10L15/24

    摘要: Speaker normalization is carried out based on biometric information available about a speaker, such as his height, or a dimension of a bodily member or article of clothing. The chosen biometric parameter correlates with the vocal tract length. Speech can be normalized based on the biometric parameter, which thus indirectly normalizes the speech based on the vocal tract length of the speaker. The inventive normalization can be used in model formation, or in actual speech recognition usage, or both. Substantial improvements in accuracy have been noted at little cost. The preferred biometric parameter is height, and the preferred form of scaling is linear scaling with the scale factor proportional to the height of the speaker.

    摘要翻译: 演讲者规范化是根据有关演讲者的生物特征信息进行的,例如他的身高或身体成员或服装的尺寸。 选择的生物特征参数与声道长度相关。 可以基于生物特征参数对语音进行归一化,从而基于说话者的声道长度间接规范语音。 本发明的归一化可以用于模型形成,或在实际的语音识别使用中,或两者。 已经很少注意到精确度的大幅提高。 优选的生物特征参数是高度,并且缩放的优选形式是线性缩放,比例因子与扬声器的高度成比例。

    System for tuning synthesized speech
    10.
    发明授权
    System for tuning synthesized speech 有权
    综合语音调谐系统

    公开(公告)号:US08438032B2

    公开(公告)日:2013-05-07

    申请号:US11621347

    申请日:2007-01-09

    IPC分类号: G10L13/08

    CPC分类号: G10L13/08 G10L13/033

    摘要: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

    摘要翻译: 本发明的实施例是用于将文本,语音合成标记语言(SSML)和/或扩展SSML转换为合成音频的软件工具。 提供规定,用于创建,查看,播放和编辑合成语音,包括编辑音调和持续时间目标,说话类型,paralinguistic事件和韵律。 可以通过样品记录的方式提供韵律。 用户可以通过图形用户界面(GUI)与软件工具进行交互。 该软件工具可以生成许多文件格式的合成音频文件输出。