SYNTHESIZING SPEECH IN MULTIPLE LANGUAGES IN CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

    公开(公告)号:US20250118286A1

    公开(公告)日:2025-04-10

    申请号:US18483342

    申请日:2023-10-09

    Abstract: In various examples, synthesizing speech in multiple languages in conversational AI systems and applications is described herein. Systems and methods are disclosed that use one or more models to synthesize speech from a first language spoken by a speaker to a second, target language selected by the speaker. In some examples, to perform the translation, the model(s) may disentangle one or more attributes associated with speech from speakers, such as speakers' identities, speakers' accents, and text associated with the speech. Additionally, the model(s) may allow for fine-grained control of additional attributes associated with output speech, such as one or more frequencies, one or more energies, and one or more phoneme durations. Furthermore, the model(s) may be configured to use the accent associated with the target language when generating text, such as when aligning text encodings with one or more phonemes.

Patent Agency Ranking