System for tuning synthesized speech
    1.
    发明授权
    System for tuning synthesized speech 有权
    综合语音调谐系统

    公开(公告)号:US08438032B2

    公开(公告)日:2013-05-07

    申请号:US11621347

    申请日:2007-01-09

    IPC分类号: G10L13/08

    CPC分类号: G10L13/08 G10L13/033

    摘要: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

    摘要翻译: 本发明的实施例是用于将文本,语音合成标记语言(SSML)和/或扩展SSML转换为合成音频的软件工具。 提供规定,用于创建,查看,播放和编辑合成语音,包括编辑音调和持续时间目标,说话类型,paralinguistic事件和韵律。 可以通过样品记录的方式提供韵律。 用户可以通过图形用户界面(GUI)与软件工具进行交互。 该软件工具可以生成许多文件格式的合成音频文件输出。

    SYSTEM FOR TUNING SYNTHESIZED SPEECH
    2.
    发明申请
    SYSTEM FOR TUNING SYNTHESIZED SPEECH 有权
    用于调谐合成语音的系统

    公开(公告)号:US20080167875A1

    公开(公告)日:2008-07-10

    申请号:US11621347

    申请日:2007-01-09

    IPC分类号: G10L13/00

    CPC分类号: G10L13/08 G10L13/033

    摘要: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

    摘要翻译: 本发明的实施例是用于将文本,语音合成标记语言(SSML)和/或扩展SSML转换为合成音频的软件工具。 提供了制作,查看,播放和编辑合成语音的规定,包括编辑音调和持续时间目标,说话类型,伴奏事件和韵律。 可以通过样品记录的方式提供韵律。 用户可以通过图形用户界面(GUI)与软件工具进行交互。 该软件工具可以生成许多文件格式的合成音频文件输出。

    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM
    3.
    发明申请
    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM 有权
    对电视系统的需求TTS VOCABULARY

    公开(公告)号:US20120095676A1

    公开(公告)日:2012-04-19

    申请号:US13279626

    申请日:2011-10-24

    IPC分类号: G01C21/36

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    On demand TTS vocabulary for a telematics system
    4.
    发明授权
    On demand TTS vocabulary for a telematics system 有权
    远程信息处理系统的按需TTS词汇表

    公开(公告)号:US08046213B2

    公开(公告)日:2011-10-25

    申请号:US10913004

    申请日:2004-08-06

    IPC分类号: G06F17/20 G10L21/00 G01C21/30

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    System and method for rescoring N-best hypotheses of an automatic speech recognition system
    5.
    发明授权
    System and method for rescoring N-best hypotheses of an automatic speech recognition system 失效
    自动语音识别系统的N最佳假设的系统和方法

    公开(公告)号:US07761296B1

    公开(公告)日:2010-07-20

    申请号:US09286099

    申请日:1999-04-02

    IPC分类号: G10L17/00 G10L15/00

    CPC分类号: G10L15/08 G10L13/02 G10L15/10

    摘要: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses. The distance of a synthesized hypothesis to the original speech signal is then computed as the sum over all phonemes in the hypothesis of the Euclidean distance between the means of the feature vectors of the frames aligning to that phoneme for the original and the synthesized signals. The text of the hypothesis which is closest under the above metric to the original waveform is chosen as the final system output.

    摘要翻译: 一种用于通过将原始语音波形与针对N个最佳假设的每个文本序列生成的合成语音波形进行比较,从自动语音识别系统中获取N个最佳假设的系统和方法。 从原始语音波形到每个合成波形计算距离,并选择与被确定为最接近原始波形的合成波形相关联的文本作为最终假设。 原始波形和每个合成波形与音素级上的相应文本序列对齐。 针对原始波形以及每个合成假设计算与每个音素对准的特征向量的平均值。 然后,将合成假设与原始语音信号的距离计算为在与原始音素对应的帧的对象的特征向量的装置与合成信号之间的欧氏距离的假设中的所有音素之和。 选择与上述度量下最接近原始波形的假设文本作为最终的系统输出。

    Generating paralinguistic phenomena via markup in text-to-speech synthesis
    6.
    发明授权
    Generating paralinguistic phenomena via markup in text-to-speech synthesis 有权
    在文本到语音合成中通过标记生成偶然现象

    公开(公告)号:US07472065B2

    公开(公告)日:2008-12-30

    申请号:US10861055

    申请日:2004-06-04

    IPC分类号: G10L13/00 G10L13/08

    CPC分类号: G10L13/08

    摘要: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.

    摘要翻译: 将标记的文本转换为合成流包括向基于处理器的系统提供标记文本,将标记的文本转换成包括词汇项的文本流,检索对应于词汇表的音频片段,将音频段连接到 形成合成流,并且可听地输出合成流,其中所述标记文本包括正常文本和paralinguistic文本; 并且其中所述正常文本通过使用语法约束与所述分段文本区分开,并且其中所述辅助文本与多于一个音频段相关联,其中所述多个音频段的检索包括选择与所述旁路文本相关联的一个音频段。

    Systems and methods for text-to-speech synthesis using spoken example
    7.
    发明授权
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US08886538B2

    公开(公告)日:2014-11-11

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/08 G10L13/10

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。

    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM
    8.
    发明申请
    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM 审中-公开
    方法和计算机程序产品,用于在文本到语音系统中提供分隔符

    公开(公告)号:US20080167876A1

    公开(公告)日:2008-07-10

    申请号:US11619682

    申请日:2007-01-04

    IPC分类号: G10L21/06

    摘要: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.

    摘要翻译: 提供了一种用于在文本到语音(TTS)系统中提供释义的方法和计算机程序产品。 该方法包括接收输入文本,解析输入文本以及确定输入文本的释义。 该方法还包括将释义合成为合成语音。 该方法还包括选择合成语音以输出,其包括:将分数分配给与每个释义相关联的每个合成语音,比较与每个释义相关联的每个合成语音的得分,以及选择最高得分合成语音以输出。 此外,该方法包括输出所选择的合成语音。

    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis
    9.
    发明授权
    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis 有权
    方法,装置和计算机程序提供用于并行文本到语音合成的多扬声器数据库

    公开(公告)号:US07716052B2

    公开(公告)日:2010-05-11

    申请号:US11101223

    申请日:2005-04-07

    IPC分类号: G10L13/00 G10L13/08 G10L13/06

    CPC分类号: G10L13/07 G10L2021/0135

    摘要: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

    摘要翻译: 一种用于生成对应于文本的可听话语词的方法,装置和计算机程序产品。 该方法包括提供文本字,并且响应于文本字,处理从多个扬声器导出的预先记录的语音片段,以便基于至少一个成本函数选择性地将语音片段并置在一起,以形成用于生成 对应于文本字的声音语音字。 还提供了一种数据结构,用于包括从多个扬声器导出的多个语音段的级联文本到语音系统,其中每个语音段包括相关联的属性向量,每个语音段包括至少一个属性 标识从中导出语音段的扬声器的向量元素。

    Application of emotion-based intonation and prosody to speech in text-to-speech systems
    10.
    发明授权
    Application of emotion-based intonation and prosody to speech in text-to-speech systems 有权
    基于情感的语调和韵律在文字到语音系统中的应用

    公开(公告)号:US07401020B2

    公开(公告)日:2008-07-15

    申请号:US10306950

    申请日:2002-11-29

    申请人: Ellen M. Eide

    发明人: Ellen M. Eide

    IPC分类号: G10L19/00

    CPC分类号: G10L13/10 Y10S715/977

    摘要: A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

    摘要翻译: 包括用于接受文本输入的装置的文本到语音系统,用于提供合成语音输出的装置,以及用于将基于情绪的特征传递给合成语音输出的装置。 用于传递基于情感的特征的布置包括用于接受用于将至少一种基于情绪的范例传递给合成语音输出的指令的布置,以及用于将至少一种基于情感的范例应用于合成语音输出的布置。