Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Panagiota Karanasou"

1.

发明授权
Techniques for model training for voice features 有权

公开(公告)号：US10937413B2

公开(公告)日：2021-03-02

申请号：US16139891

申请日：2018-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers

IPC: G10L15/06 , G10L15/00 , G10L15/02 , G06F40/58

Abstract: Techniques are provided for training a target language model based at least in part on data associated with a reference language model. For example, language data utilized to train an English language model may be translated and provided as training data to train a German language model to recognize utterances provided in German. By utilizing the techniques herein, the efficiency of training a new language model may be improved due at least in part to replacing labor-intensive operations conventionally performed by specialized personnel with machine-generated data. Additionally, techniques discussed herein provide for reducing the time required for training a new language model by leveraging information associated with utterances of one language to train the new language model associated with a different language.

2.

发明申请
TECHNIQUES FOR MODEL TRAINING FOR VOICE FEATURES 审中-公开

公开(公告)号：US20200098352A1

公开(公告)日：2020-03-26

申请号：US16139891

申请日：2018-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers

IPC: G10L15/06 , G10L15/00 , G10L15/02 , G06F17/28

Abstract: Techniques are provided for training a target language model based at least in part on data associated with a reference language model. For example, language data utilized to train an English language model may be translated and provided as training data to train a German language model to recognize utterances provided in German. By utilizing the techniques herein, the efficiency of training a new language model may be improved due at least in part to replacing labor-intensive operations conventionally performed by specialized personnel with machine-generated data. Additionally, techniques discussed herein provide for reducing the time required for training a new language model by leveraging information associated with utterances of one language to train the new language model associated with a different language.

3.

发明授权
Synthetic speech processing by representing text by phonemes exhibiting predicted volume and pitch using neural networks 有权

公开(公告)号：US11978431B1

公开(公告)日：2024-05-07

申请号：US17326886

申请日：2021-05-21

Applicant: Amazon Technologies, Inc.

Inventor： Arnaud Joly , Simon Slangen , Alexis Pierre Moinet , Thomas Renaud Drugman , Panagiota Karanasou , Syed Ammar Abbas , Sri Vishnu Kumar Karlapati

IPC: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32

CPC classification number: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32

Abstract: A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.

4.

发明授权
Learned condition text-to-speech synthesis 有权

公开(公告)号：US11830476B1

公开(公告)日：2023-11-28

申请号：US17342206

申请日：2021-06-08

Applicant: Amazon Technologies, Inc.

Inventor： Panagiota Karanasou , Sri Vishnu Kumar Karlapati , Alexis Pierre Moinet , Arnaud Vincent Pierre Yves Joly , Syed Ammar Abbas , Thomas Renaud Drugman , Jaime Lorenzo Trueba

IPC: G10L13/033 , G10L25/51 , G10L13/10 , G10L13/08 , G06N3/08 , G10L25/30 , G10L13/07

CPC classification number: G10L13/10 , G06N3/08 , G10L13/07 , G10L13/086 , G10L25/30

Abstract: Devices and techniques are generally described for learned condition text-to-speech synthesis. In some examples, first data representing a selection of a type of prosodic expressivity may be received. In some further examples, a selection of content comprising text data may be received. First audio data may be determined that includes an audio representation of the text data. The first audio data may be generated based at least in part on sampling from a first latent distribution generated using a conditional primary variational autoencoder (VAE). The sampling from the first latent distribution may be conditioned on a first learned distribution associated with the type of prosodic expressivity. In various examples, the first audio data may be sent to a first computing device.

5.

发明授权
Techniques for model training for voice features 有权

公开(公告)号：US10854189B2

公开(公告)日：2020-12-01

申请号：US16139984

申请日：2018-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers

IPC: G10L15/00 , G10L15/06 , G10L15/02 , G06F40/58

Abstract: Techniques are provided for training a language recognition model. For example, a language recognition model may be maintained and associated with a reference language (e.g., English). The language recognition model may be configured to accept as input an utterance in the reference language and to identify a feature to be executed in response to receiving the utterance. New language data (e.g., other utterances) provided in a different language (e.g., German) may be obtained. This new language data may be translated to English and utilized to retrain the model to recognize reference language data as well as language data translated to the reference language. Subsequent utterances (e.g., English utterances, or German utterances translated to English) may be provided to the updated model and a feature may be identified. One or more instructions may be sent to a user device to execute a set of instructions associated with the feature.

6.

发明申请
TECHNIQUES FOR MODEL TRAINING FOR VOICE FEATURES 审中-公开

公开(公告)号：US20200098351A1

公开(公告)日：2020-03-26

申请号：US16139984

申请日：2018-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers

IPC: G10L15/00 , G06F17/28 , G10L15/02 , G10L15/06

Abstract: Techniques are provided for training a language recognition model. For example, a language recognition model may be maintained and associated with a reference language (e.g., English). The language recognition model may be configured to accept as input an utterance in the reference language and to identify a feature to be executed in response to receiving the utterance. New language data (e.g., other utterances) provided in a different language (e.g., German) may be obtained. This new language data may be translated to English and utilized to retrain the model to recognize reference language data as well as language data translated to the reference language. Subsequent utterances (e.g., English utterances, or German utterances translated to English) may be provided to the updated model and a feature may be identified. One or more instructions may be sent to a user device to execute a set of instructions associated with the feature.

7.

发明授权
Multi-scale spectrogram text-to-speech 有权

公开(公告)号：US11694674B1

公开(公告)日：2023-07-04

申请号：US17331427

申请日：2021-05-26

Applicant: Amazon Technologies, Inc.

Inventor： Syed Ammar Abbas , Bajibabu Bollepalli , Alexis Pierre Moinet , Thomas Renaud Drugman , Arnaud Vincent Pierre Yves Joly , Panagiota Karanasou , Sri Vishnu Kumar Karlapati , Simon Slangen , Petr Makarov

IPC: G10L13/08 , G10L25/30

CPC classification number: G10L13/08 , G10L25/30

Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.

8.

发明授权
Synthetic speech processing 有权

公开(公告)号：US11574624B1

公开(公告)日：2023-02-07

申请号：US17218466

申请日：2021-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Arnaud Vincent Pierre Yves Joly , Panagiota Karanasou , Alexis Pierre Jean-Baptiste Moinet , Thomas Renaud Drugman , Sri Vishnu Kumar Karlapati , Syed Ammar Abbas , Simon Slangen

IPC: G10L13/08 , G10L13/07 , G10L13/06 , G10L25/30 , G10L13/10 , G10L13/047

Abstract: A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification