-
公开(公告)号:US12243511B1
公开(公告)日:2025-03-04
申请号:US17709788
申请日:2022-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Arnaud Vincent Pierre Yves Joly , Marco Nicolis , Elena Sergeevna Sokolova , Jedrzej Sobanski , Mateusz Aleksander Lajszczak , Arent van Korlaar , Ruizhe Li
IPC: G10L13/10 , G10L13/033 , G10L13/04 , G10L13/06 , G10L15/26
Abstract: A neural text-to-speech system may be configured to emphasize words. Applying emphasis where appropriate enables the TTS system to better reproduce prosodic characteristics of human speech. Emphasis may make the resulting synthesized speech more understandable and engaging than synthesized speech lacking emphasis. Emphasis may be manually annotated to, and/or predicted from, a source text (e.g., a book). In some implementations, the system may use a generative model such as a variational autoencoder to generate word acoustic embeddings indicating how emphasis is to be reflected in the synthesized speech. A phoneme encoder of the TTS system may process phonemes to generate phoneme embeddings. A decoder may process the word acoustic embeddings and the phoneme embeddings to generate spectrogram data representing the synthesized speech.