-
公开(公告)号:US11545134B1
公开(公告)日:2023-01-03
申请号:US16709792
申请日:2019-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Marcello Federico , Robert Enyedi , Yaser Al-Onaizan , Roberto Barra-Chicote , Andrew Paul Breen , Ritwik Giri , Mehmet Umut Isik , Arvindh Krishnaswamy , Hassan Sawaf
IPC: G10L13/08 , G10L15/22 , G11B20/10 , G06F3/16 , G10L13/10 , G06F40/47 , G10L25/90 , G10L15/06 , G10L13/00 , G10L15/26 , G06V40/16
Abstract: Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.