Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Adam Marek Gabrys"

1.

发明公开
VOICE ADAPTATION USING SYNTHETIC SPEECH PROCESSING 审中-公开

公开(公告)号：US20230260502A1

公开(公告)日：2023-08-17

申请号：US17671006

申请日：2022-02-14

Applicant: Amazon Technologies, Inc.

Inventor： Adam Marek Gabrys , Jaime Lorenzo Trueba , Goeric Sydney Huybrechts

IPC: G10L13/047 , G10L13/08 , G10L13/027 , G06N3/04 , G10L19/16

CPC classification number: G10L13/047 , G10L13/08 , G10L13/027 , G06N3/0454 , G10L19/16

Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

2.

发明授权
Augmenting datasets for training audio generation models 有权

公开(公告)号：US12254864B1

公开(公告)日：2025-03-18

申请号：US17854439

申请日：2022-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Mateusz Aleksander Lajszczak , Adam Marek Gabrys , Arent van Korlaar , Ruizhe Li , Elena Sergeevna Sokolova , Jaime Lorenzo Trueba , Arnaud Vincent Pierre Yves Joly , Marco Nicolis , Ekaterina Petrova

IPC: G10L13/047 , G10L15/02 , G10L15/16 , G10L15/18 , G10L15/22 , G10L25/18

Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data, and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.

3.

发明授权
Voice adaptation using synthetic speech processing 有权

公开(公告)号：US11915683B2

公开(公告)日：2024-02-27

申请号：US17671006

申请日：2022-02-14

Applicant: Amazon Technologies, Inc.

Inventor： Adam Marek Gabrys , Jaime Lorenzo Trueba , Goeric Sydney Huybrechts

IPC: G10L19/16 , G10L13/08 , G10L13/047 , G06N3/045 , G10L13/027

CPC classification number: G10L13/047 , G06N3/045 , G10L13/027 , G10L13/08 , G10L19/16

Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification