CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS

发明公开

EP4345815A3 CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS 审中-实审

请登陆查看更多内容

专利标题： CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS
申请号： EP24156282.6

申请日： 2020-07-16
公开(公告)号： EP4345815A3

公开(公告)日： 2024-06-12
发明人: STANTON, Daisy , BATTENBERG, Eric Dean , SKERRY-RYAN, Russell, John Wyatt , MARIOORYAD, Soroosh , KAO, David Teh-hwa , BAGBY, Thomas Edward , SHANNON, Sean Matthew
申请人： GOOGLE LLC
申请人地址： US Mountain View CA 94043 1600 Amphitheatre Parkway
专利权人： GOOGLE LLC
当前专利权人： GOOGLE LLC
当前专利权人地址： US Mountain View CA 94043 1600 Amphitheatre Parkway
代理机构： Lewis, Stefanie Janneke
优先权： US 1962882511P 2019.08.03
分案原申请号： 20754849.6 2020.07.16
主分类号： G10L13/10
IPC分类号： G10L13/10 ; G10L13/033 ; G10L13/047

CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS

摘要：

A system (900) includes a context encoder (610), a text-prediction network (520), and a text-to-speech (TTS) model (650). The context encoder is configured to receive one or more context features (602) associated with current input text (502) and process the one or more context features to generate a context embedding (612) associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding (650) for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech (680). The TTS model is configured to process the current input text and the style embedding to generate an output audio signal (670) of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

公开/授权文献

EP4345815A2 CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS 公开/授权日：2024-04-03

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/08	.文本分析或文本以外的语音合成参数的产生，例如语义图翻译为音素、韵律产生、重音或声调测定
G10L13/10	..来自文本的韵律规则；重音或声调