VOICE ADAPTATION USING SYNTHETIC SPEECH PROCESSING

    公开(公告)号:US20230260502A1

    公开(公告)日:2023-08-17

    申请号:US17671006

    申请日:2022-02-14

    CPC classification number: G10L13/047 G10L13/08 G10L13/027 G06N3/0454 G10L19/16

    Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

    Voice adaptation using synthetic speech processing

    公开(公告)号:US11915683B2

    公开(公告)日:2024-02-27

    申请号:US17671006

    申请日:2022-02-14

    CPC classification number: G10L13/047 G06N3/045 G10L13/027 G10L13/08 G10L19/16

    Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

Patent Agency Ranking