-
公开(公告)号:US11443733B2
公开(公告)日:2022-09-13
申请号:US16665886
申请日:2019-10-28
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra Chicote , Javier Latorre , Adam Franciszek Nadolski , Viacheslav Klimkov , Thomas Edward Merritt
IPC: G10L13/10 , G10L13/033 , G10L13/047
Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.
-
公开(公告)号:US11410639B2
公开(公告)日:2022-08-09
申请号:US16922590
申请日:2020-07-07
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US10741169B1
公开(公告)日:2020-08-11
申请号:US16141241
申请日:2018-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
-