SPEECH RECOGNITION ASSISTED EVALUATION ON TEXT-TO-SPEECH PRONUNCIATION ISSUE DETECTION
    4.
    发明公开
    SPEECH RECOGNITION ASSISTED EVALUATION ON TEXT-TO-SPEECH PRONUNCIATION ISSUE DETECTION 有权
    语音识别辅助对文本到语音发音问题检测的评估

    公开(公告)号:EP2965313A1

    公开(公告)日:2016-01-13

    申请号:EP14710178.6

    申请日:2014-02-27

    IPC分类号: G10L13/08

    CPC分类号: G10L13/086 G10L13/08

    摘要: Pronunciation issues for synthesized speech are automatically detected using human recordings as a reference within a Speech Recognition Assisted Evaluation (SRAE) framework including a Text-To-Speech flow and a Speech Recognition (SR) flow. A pronunciation issue detector evaluates results obtained at multiple levels of the TTS flow and the SR flow (e.g. phone, word, and signal level) by using the corresponding human recordings as the reference for the synthesized speech, and outputs possible pronunciation issues. A signal level may be used to determine similarities/differences between the recordings and the TTS output. A model level checker may provide results to the pronunciation issue detector to check the similarities of the TTS and the SR phone set including mapping relations. Results from a comparison of the SR output and the recordings may also be evaluation by the pronunciation issue detector. The pronunciation issue detector outputs a list that lists potential pronunciation issue candidates.

    摘要翻译: 语音识别辅助评估(SRAE)框架包括文本到语音流和语音识别(SR)流,使用人类记录作为参考自动检测合成语音的发音问题。 发音问题检测器通过使用相应的人类记录作为合成语音的参考来评估在TTS流程和SR流程的多个级别(例如,电话,词汇和信号级别)获得的结果,并输出可能的发音问题。 信号电平可以用于确定记录和TTS输出之间的相似性/差异。 模型级别检查器可以向发音问题检测器提供结果以检查TTS和SR电话机组的相似性,包括映射关系。 SR输出和记录比较的结果也可以由发音问题检测器进行评估。 发音问题检测器输出列出潜在发音问题候选者的列表。

    VOICE FONT SPEAKER AND PROSODY INTERPOLATION
    8.
    发明公开
    VOICE FONT SPEAKER AND PROSODY INTERPOLATION 审中-公开
    SPRACHSCHRIFT-LAUTSPRECHER UND PROSODIEINPOLATION

    公开(公告)号:EP3111442A1

    公开(公告)日:2017-01-04

    申请号:EP15707242.2

    申请日:2015-02-23

    IPC分类号: G10L13/08 G10L13/033

    摘要: Multi-voice font interpolation is provided. A multi-voice font interpolation engine allows the production of computer generated speech with a wide variety of speaker characteristics and/or prosody by interpolating speaker characteristics and prosody from existing fonts. Using prediction models from multiple voice fonts, the multi-voice font interpolation engine predicts values for the parameters that influence speaker characteristics and/or prosody for the phoneme sequence obtained from the text to spoken. For each parameter, additional parameter values are generated by a weighted interpolation from the predicted values. Modifying an existing voice font with the interpolated parameters changes the style and/or emotion of the speech while retaining the base sound qualities of the original voice. The multi-voice font interpolation engine allows the speaker characteristics and/or prosody to be transplanted from one voice font to another or entirely new speaker characteristics and/or prosody to be generated for an existing voice font.

    摘要翻译: 提供多语音字体插补。 多语音字体插值引擎通过从现有字体插入扬声器特征和韵律,允许生成具有各种扬声器特征和/或韵律的计算机生成语音。 多语音字体插值引擎使用来自多个语音字体的预测模型来预测影响说话者特征的参数值和/或从要发音的文本获得的音素序列的韵律。 对于每个参数,通过来自预测值的加权内插生成附加参数值。 使用内插参数修改现有的语音字体会改变语音的风格和/或情绪,同时保留原始语音的基本声音质量。 多语音字体插入引擎允许扬声器特征和/或韵律从一个语音字体移植到另一个或全新的扬声器特征和/或为现有语音字体生成的韵律。