SPEECH TRANSLATION WITH PERFORMANCE CHARACTERISTICS

    公开(公告)号:US20240274122A1

    公开(公告)日:2024-08-15

    申请号:US18193349

    申请日:2023-03-30

    CPC classification number: G10L13/086 G10L15/16 G10L25/63 H04N21/8106

    Abstract: An expressive speech translation system may process source speech in a source language and output synthesized speech in a target language while retaining vocal performance characteristics such as intonation, emphasis, rhythm, style, and/or emotion. The system may receive a transcript of the source speech, translate it, and generate transcript data. To generate the synthesized speech, the system may process the transcript data with a language embedding representing language-dependent speech characteristics of the target language, a speaker embedding representing speaker-dependent voice identity characteristics of a speaker, and a performance embedding representing the vocal performance characteristics of the source speech. The system may control the duration of segments of the synthesized speech to better align with corresponding segments of the source speech for the purpose of dubbing multimedia content with synthesized speech in a language different from that of the original audio.

Patent Agency Ranking