-
公开(公告)号:US11545134B1
公开(公告)日:2023-01-03
申请号:US16709792
申请日:2019-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Marcello Federico , Robert Enyedi , Yaser Al-Onaizan , Roberto Barra-Chicote , Andrew Paul Breen , Ritwik Giri , Mehmet Umut Isik , Arvindh Krishnaswamy , Hassan Sawaf
IPC: G10L13/08 , G10L15/22 , G11B20/10 , G06F3/16 , G10L13/10 , G06F40/47 , G10L25/90 , G10L15/06 , G10L13/00 , G10L15/26 , G06V40/16
Abstract: Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.
-
公开(公告)号:US12272350B2
公开(公告)日:2025-04-08
申请号:US18664461
申请日:2024-05-15
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11990118B2
公开(公告)日:2024-05-21
申请号:US18206301
申请日:2023-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US20240296827A1
公开(公告)日:2024-09-05
申请号:US18664461
申请日:2024-05-15
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11735162B2
公开(公告)日:2023-08-22
申请号:US17882691
申请日:2022-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US20240013770A1
公开(公告)日:2024-01-11
申请号:US18206301
申请日:2023-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
IPC: G10L13/047
CPC classification number: G10L13/047
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US20230058658A1
公开(公告)日:2023-02-23
申请号:US17882691
申请日:2022-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11410639B2
公开(公告)日:2022-08-09
申请号:US16922590
申请日:2020-07-07
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US10741169B1
公开(公告)日:2020-08-11
申请号:US16141241
申请日:2018-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US10510358B1
公开(公告)日:2019-12-17
申请号:US15719950
申请日:2017-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra-Chicote , Alexis Moinet , Nikko Strom
IPC: G10L21/038 , G10L19/07 , G10L21/02 , G10L13/08 , G10L25/30 , G10L13/047
Abstract: An approach to speech synthesis uses two phases in which a relatively low quality waveform is computed, and that waveform is passed through an enhancement phase which generates the waveform that is ultimately used to produce the acoustic signal provided to the user. For example, the first phase and the second phase are each implemented using a separate artificial neural network. The two phases may be computationally preferable to using a direct approach to yield a synthesized waveform of comparable quality.
-
-
-
-
-
-
-
-
-