专利检索 ap:("Google LLC") AND inv:"KAO, David Teh-hwa" 第 1 页

1.

发明公开
VARIATIONAL EMBEDDING CAPACITY IN EXPRESSIVE END-TO-END SPEECH SYNTHESIS 审中-公开

公开(公告)号：EP3966803A1

公开(公告)日：2022-03-16

申请号：EP20730949.3

申请日：2020-05-20

申请人： Google LLC

发明人： BATTENBERG, Eric Dean , STANTON, Daisy , SKERRY-RYAN, Russell John Wyatt , MARIOORYAD, Soroosh , KAO, David Teh-hwa , BAGBY, Thomas Edward , SHANNON, Sean Matthew

IPC分类号： G10L13/033 , G06N3/04 , G06N3/08 , G06N7/00 , G10L13/10 , G10L25/30

2.

发明公开
CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS 审中-实审

公开(公告)号：EP4345815A3

公开(公告)日：2024-06-12

申请号：EP24156282.6

申请日：2020-07-16

申请人： GOOGLE LLC

发明人： STANTON, Daisy , BATTENBERG, Eric Dean , SKERRY-RYAN, Russell, John Wyatt , MARIOORYAD, Soroosh , KAO, David Teh-hwa , BAGBY, Thomas Edward , SHANNON, Sean Matthew

IPC分类号： G10L13/10 , G10L13/033 , G10L13/047

CPC分类号： G10L13/033 , G10L13/10 , G10L13/047

摘要： A system (900) includes a context encoder (610), a text-prediction network (520), and a text-to-speech (TTS) model (650). The context encoder is configured to receive one or more context features (602) associated with current input text (502) and process the one or more context features to generate a context embedding (612) associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding (650) for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech (680). The TTS model is configured to process the current input text and the style embedding to generate an output audio signal (670) of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

3.

发明公开
CONTROLLING EXPRESSIVITY IN END-TO-END SPEECH SYNTHESIS SYSTEMS 审中-实审

公开(公告)号：EP4345815A2

公开(公告)日：2024-04-03

申请号：EP24156282.6

申请日：2020-07-16

申请人： GOOGLE LLC

发明人： STANTON, Daisy , BATTENBERG, Eric Dean , SKERRY-RYAN, Russell, John Wyatt , MARIOORYAD, Soroosh , KAO, David Teh-hwa , BAGBY, Thomas Edward , SHANNON, Sean Matthew

IPC分类号： G10L13/033

摘要： A system (900) includes a context encoder (610), a text-prediction network (520), and a text-to-speech (TTS) model (650). The context encoder is configured to receive one or more context features (602) associated with current input text (502) and process the one or more context features to generate a context embedding (612) associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding (650) for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech (680). The TTS model is configured to process the current input text and the style embedding to generate an output audio signal (670) of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类