On demand TTS vocabulary for a telematics system
    1.
    发明授权
    On demand TTS vocabulary for a telematics system 有权
    远程信息处理系统的按需TTS词汇表

    公开(公告)号:US08046213B2

    公开(公告)日:2011-10-25

    申请号:US10913004

    申请日:2004-08-06

    IPC分类号: G06F17/20 G10L21/00 G01C21/30

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    System and method for rescoring N-best hypotheses of an automatic speech recognition system
    2.
    发明授权
    System and method for rescoring N-best hypotheses of an automatic speech recognition system 失效
    自动语音识别系统的N最佳假设的系统和方法

    公开(公告)号:US07761296B1

    公开(公告)日:2010-07-20

    申请号:US09286099

    申请日:1999-04-02

    IPC分类号: G10L17/00 G10L15/00

    CPC分类号: G10L15/08 G10L13/02 G10L15/10

    摘要: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses. The distance of a synthesized hypothesis to the original speech signal is then computed as the sum over all phonemes in the hypothesis of the Euclidean distance between the means of the feature vectors of the frames aligning to that phoneme for the original and the synthesized signals. The text of the hypothesis which is closest under the above metric to the original waveform is chosen as the final system output.

    摘要翻译: 一种用于通过将原始语音波形与针对N个最佳假设的每个文本序列生成的合成语音波形进行比较,从自动语音识别系统中获取N个最佳假设的系统和方法。 从原始语音波形到每个合成波形计算距离,并选择与被确定为最接近原始波形的合成波形相关联的文本作为最终假设。 原始波形和每个合成波形与音素级上的相应文本序列对齐。 针对原始波形以及每个合成假设计算与每个音素对准的特征向量的平均值。 然后,将合成假设与原始语音信号的距离计算为在与原始音素对应的帧的对象的特征向量的装置与合成信号之间的欧氏距离的假设中的所有音素之和。 选择与上述度量下最接近原始波形的假设文本作为最终的系统输出。

    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM
    3.
    发明申请
    ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM 有权
    对电视系统的需求TTS VOCABULARY

    公开(公告)号:US20120095676A1

    公开(公告)日:2012-04-19

    申请号:US13279626

    申请日:2011-10-24

    IPC分类号: G01C21/36

    CPC分类号: G10L13/04 G01C21/3629

    摘要: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

    摘要翻译: 驾驶方向系统将来自大众媒体存储器的地理名称的预先记录的讲话话语的有限子集加载到记忆体中。 讲话语音的子集可以例如限于驾驶员现在位置的预定半径(例如几英里)内的地理名称。 驾驶员的当前位置可以由驾驶员手动输入驾驶方向系统,或者使用全球定位系统(“GPS”)接收机自动确定。 随着车辆从现在的位置移动,驾驶方向系统从大容量媒体存储器中加载新名称,并且如果需要,覆盖现在地理上超出范围的那些。 根据目前驾驶的位置,驾驶方向系统可以从运行时记忆体中可听见地输出地名。

    Systems and methods for text-to-speech synthesis using spoken example
    4.
    发明授权
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US08886538B2

    公开(公告)日:2014-11-11

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/08 G10L13/10

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。

    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM
    5.
    发明申请
    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM 审中-公开
    方法和计算机程序产品,用于在文本到语音系统中提供分隔符

    公开(公告)号:US20080167876A1

    公开(公告)日:2008-07-10

    申请号:US11619682

    申请日:2007-01-04

    IPC分类号: G10L21/06

    摘要: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.

    摘要翻译: 提供了一种用于在文本到语音(TTS)系统中提供释义的方法和计算机程序产品。 该方法包括接收输入文本,解析输入文本以及确定输入文本的释义。 该方法还包括将释义合成为合成语音。 该方法还包括选择合成语音以输出,其包括:将分数分配给与每个释义相关联的每个合成语音,比较与每个释义相关联的每个合成语音的得分,以及选择最高得分合成语音以输出。 此外,该方法包括输出所选择的合成语音。

    SYSTEM FOR TUNING SYNTHESIZED SPEECH
    6.
    发明申请
    SYSTEM FOR TUNING SYNTHESIZED SPEECH 有权
    用于调谐合成语音的系统

    公开(公告)号:US20080167875A1

    公开(公告)日:2008-07-10

    申请号:US11621347

    申请日:2007-01-09

    IPC分类号: G10L13/00

    CPC分类号: G10L13/08 G10L13/033

    摘要: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

    摘要翻译: 本发明的实施例是用于将文本,语音合成标记语言(SSML)和/或扩展SSML转换为合成音频的软件工具。 提供了制作,查看,播放和编辑合成语音的规定,包括编辑音调和持续时间目标,说话类型,伴奏事件和韵律。 可以通过样品记录的方式提供韵律。 用户可以通过图形用户界面(GUI)与软件工具进行交互。 该软件工具可以生成许多文件格式的合成音频文件输出。

    System for tuning synthesized speech
    7.
    发明授权
    System for tuning synthesized speech 有权
    综合语音调谐系统

    公开(公告)号:US08438032B2

    公开(公告)日:2013-05-07

    申请号:US11621347

    申请日:2007-01-09

    IPC分类号: G10L13/08

    CPC分类号: G10L13/08 G10L13/033

    摘要: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

    摘要翻译: 本发明的实施例是用于将文本,语音合成标记语言(SSML)和/或扩展SSML转换为合成音频的软件工具。 提供规定,用于创建,查看,播放和编辑合成语音,包括编辑音调和持续时间目标,说话类型,paralinguistic事件和韵律。 可以通过样品记录的方式提供韵律。 用户可以通过图形用户界面(GUI)与软件工具进行交互。 该软件工具可以生成许多文件格式的合成音频文件输出。

    Generating paralinguistic phenomena via markup in text-to-speech synthesis
    8.
    发明授权
    Generating paralinguistic phenomena via markup in text-to-speech synthesis 有权
    在文本到语音合成中通过标记生成偶然现象

    公开(公告)号:US07472065B2

    公开(公告)日:2008-12-30

    申请号:US10861055

    申请日:2004-06-04

    IPC分类号: G10L13/00 G10L13/08

    CPC分类号: G10L13/08

    摘要: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.

    摘要翻译: 将标记的文本转换为合成流包括向基于处理器的系统提供标记文本,将标记的文本转换成包括词汇项的文本流,检索对应于词汇表的音频片段,将音频段连接到 形成合成流,并且可听地输出合成流,其中所述标记文本包括正常文本和paralinguistic文本; 并且其中所述正常文本通过使用语法约束与所述分段文本区分开,并且其中所述辅助文本与多于一个音频段相关联,其中所述多个音频段的检索包括选择与所述旁路文本相关联的一个音频段。

    Speech coding apparatus and method for generating acoustic feature
vector component values by combining values of the same features for
multiple time intervals
    9.
    发明授权
    Speech coding apparatus and method for generating acoustic feature vector component values by combining values of the same features for multiple time intervals 失效
    用于通过组合多个时间间隔的相同特征的值来生成声学特征矢量分量值的语音编码装置和方法

    公开(公告)号:US5544277A

    公开(公告)日:1996-08-06

    申请号:US98682

    申请日:1993-07-28

    CPC分类号: G10L15/02 G10L15/20

    摘要: A speech coding apparatus and method measures the values of at least first and second different features of an utterance during each of a series of successive time intervals. For each time interval, a feature vector signal has a first component value equal to a first weighted combination of the values of only one feature of the utterance for at least two time intervals. The feature vector signal has a second component value equal to a second weighted combination, different from the first weighted combination, of the values of only one feature of the utterance for at least two time intervals. The resulting feature vector signals for a series of successive time intervals form a coded representation of the utterance. In one embodiment, a first weighted mixture signal has a value equal to a first weighted mixture of the values of the features of the utterance during a single time interval. A second weighted mixture signal has a value equal to a second weighted mixture, different from the first weighted mixture, of the values of the features of the utterance during a single time interval. The first component value of each feature vector signal is equal to a first weighted combination of the values of only the first weighted mixture signals for at least two time intervals, and the second component value of each feature vector signal is equal to a second weighted combination, different from the first weighted combination, of the values of only the second weighted mixture for at least two time intervals.

    摘要翻译: 语音编码装置和方法在一系列连续时间间隔的每一个期间测量话音的至少第一和第二不同特征的值。 对于每个时间间隔,特征向量信号具有等于至少两个时间间隔的仅一个特征的值的第一加权组合的第一分量值。 特征向量信号具有等于至少两个时间间隔的话语的一个特征的值的等于第一加权组合的第二加权组合的第二分量值。 所得到的一系列连续时间间隔的特征矢量信号形成话音的编码表示。 在一个实施例中,第一加权混合信号具有等于在单个时间间隔期间话音特征值的第一加权混合的值。 第二加权混合信号具有等于在单个时间间隔期间话音特征的值的与第一加权混合不同的第二加权混合的值。 每个特征向量信号的第一分量值等于至少两个时间间隔的仅第一加权混合信号的值的第一加权组合,并且每个特征向量信号的第二分量值等于第二加权组合 与第一加权组合不同的是仅至少两个时间间隔的第二加权混合值的值。

    Systems and methods for text-to-speech synthesis using spoken example
    10.
    发明申请
    Systems and methods for text-to-speech synthesis using spoken example 有权
    使用口头示例的文本到语音合成的系统和方法

    公开(公告)号:US20050071163A1

    公开(公告)日:2005-03-31

    申请号:US10672374

    申请日:2003-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/10

    摘要: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

    摘要翻译: 用于语音合成的系统和方法,特别是用于通过处理文本输入的口语示例的韵律和语音内容来将文本输入转换为合成波形的文本到语音系统和方法,以精确地模拟输入的语音风格和 发音。 系统和方法为TTS系统提供了一个接口,允许用户输入文本字符串和语音文本串的话语,从口头输入中提取韵律参数,并处理韵律参数以导出文本输入的相应标记 使一个更自然的声音合成语音。