-
公开(公告)号:US20240274122A1
公开(公告)日:2024-08-15
申请号:US18193349
申请日:2023-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Duo Wang , Vincent Laurent J. Pollet , Mikolaj Wojciech Babianski , Jakub Bartlomiej Swiatkowski
CPC classification number: G10L13/086 , G10L15/16 , G10L25/63 , H04N21/8106
Abstract: An expressive speech translation system may process source speech in a source language and output synthesized speech in a target language while retaining vocal performance characteristics such as intonation, emphasis, rhythm, style, and/or emotion. The system may receive a transcript of the source speech, translate it, and generate transcript data. To generate the synthesized speech, the system may process the transcript data with a language embedding representing language-dependent speech characteristics of the target language, a speaker embedding representing speaker-dependent voice identity characteristics of a speaker, and a performance embedding representing the vocal performance characteristics of the source speech. The system may control the duration of segments of the synthesized speech to better align with corresponding segments of the source speech for the purpose of dubbing multimedia content with synthesized speech in a language different from that of the original audio.