Attention-based clockwork hierarchical variational encoder

    公开(公告)号:US12272349B2

    公开(公告)日:2025-04-08

    申请号:US18487227

    申请日:2023-10-16

    Applicant: Google LLC

    Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

    Two-level speech prosody transfer

    公开(公告)号:US11514888B2

    公开(公告)日:2022-11-29

    申请号:US16992410

    申请日:2020-08-13

    Applicant: Google LLC

    Abstract: A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation tor the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.

Patent Agency Ranking