-
公开(公告)号:US10937413B2
公开(公告)日:2021-03-02
申请号:US16139891
申请日:2018-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers
Abstract: Techniques are provided for training a target language model based at least in part on data associated with a reference language model. For example, language data utilized to train an English language model may be translated and provided as training data to train a German language model to recognize utterances provided in German. By utilizing the techniques herein, the efficiency of training a new language model may be improved due at least in part to replacing labor-intensive operations conventionally performed by specialized personnel with machine-generated data. Additionally, techniques discussed herein provide for reducing the time required for training a new language model by leveraging information associated with utterances of one language to train the new language model associated with a different language.
-
公开(公告)号:US20200098352A1
公开(公告)日:2020-03-26
申请号:US16139891
申请日:2018-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers
Abstract: Techniques are provided for training a target language model based at least in part on data associated with a reference language model. For example, language data utilized to train an English language model may be translated and provided as training data to train a German language model to recognize utterances provided in German. By utilizing the techniques herein, the efficiency of training a new language model may be improved due at least in part to replacing labor-intensive operations conventionally performed by specialized personnel with machine-generated data. Additionally, techniques discussed herein provide for reducing the time required for training a new language model by leveraging information associated with utterances of one language to train the new language model associated with a different language.
-
公开(公告)号:US11978431B1
公开(公告)日:2024-05-07
申请号:US17326886
申请日:2021-05-21
Applicant: Amazon Technologies, Inc.
Inventor: Arnaud Joly , Simon Slangen , Alexis Pierre Moinet , Thomas Renaud Drugman , Panagiota Karanasou , Syed Ammar Abbas , Sri Vishnu Kumar Karlapati
IPC: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32
CPC classification number: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32
Abstract: A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.
-
公开(公告)号:US11830476B1
公开(公告)日:2023-11-28
申请号:US17342206
申请日:2021-06-08
Applicant: Amazon Technologies, Inc.
Inventor: Panagiota Karanasou , Sri Vishnu Kumar Karlapati , Alexis Pierre Moinet , Arnaud Vincent Pierre Yves Joly , Syed Ammar Abbas , Thomas Renaud Drugman , Jaime Lorenzo Trueba
CPC classification number: G10L13/10 , G06N3/08 , G10L13/07 , G10L13/086 , G10L25/30
Abstract: Devices and techniques are generally described for learned condition text-to-speech synthesis. In some examples, first data representing a selection of a type of prosodic expressivity may be received. In some further examples, a selection of content comprising text data may be received. First audio data may be determined that includes an audio representation of the text data. The first audio data may be generated based at least in part on sampling from a first latent distribution generated using a conditional primary variational autoencoder (VAE). The sampling from the first latent distribution may be conditioned on a first learned distribution associated with the type of prosodic expressivity. In various examples, the first audio data may be sent to a first computing device.
-
公开(公告)号:US10854189B2
公开(公告)日:2020-12-01
申请号:US16139984
申请日:2018-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers
Abstract: Techniques are provided for training a language recognition model. For example, a language recognition model may be maintained and associated with a reference language (e.g., English). The language recognition model may be configured to accept as input an utterance in the reference language and to identify a feature to be executed in response to receiving the utterance. New language data (e.g., other utterances) provided in a different language (e.g., German) may be obtained. This new language data may be translated to English and utilized to retrain the model to recognize reference language data as well as language data translated to the reference language. Subsequent utterances (e.g., English utterances, or German utterances translated to English) may be provided to the updated model and a feature may be identified. One or more instructions may be sent to a user device to execute a set of instructions associated with the feature.
-
公开(公告)号:US20200098351A1
公开(公告)日:2020-03-26
申请号:US16139984
申请日:2018-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Jonathan B. Feinstein , Alok Verma , Amina Shabbeer , Brandon Scott Durham , Catherine Breslin , Edward Bueche , Fabian Moerchen , Fabian Triefenbach , Klaus Reiter , Toby R. Latin-Stoermer , Panagiota Karanasou , Judith Gaspers
Abstract: Techniques are provided for training a language recognition model. For example, a language recognition model may be maintained and associated with a reference language (e.g., English). The language recognition model may be configured to accept as input an utterance in the reference language and to identify a feature to be executed in response to receiving the utterance. New language data (e.g., other utterances) provided in a different language (e.g., German) may be obtained. This new language data may be translated to English and utilized to retrain the model to recognize reference language data as well as language data translated to the reference language. Subsequent utterances (e.g., English utterances, or German utterances translated to English) may be provided to the updated model and a feature may be identified. One or more instructions may be sent to a user device to execute a set of instructions associated with the feature.
-
公开(公告)号:US11694674B1
公开(公告)日:2023-07-04
申请号:US17331427
申请日:2021-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Syed Ammar Abbas , Bajibabu Bollepalli , Alexis Pierre Moinet , Thomas Renaud Drugman , Arnaud Vincent Pierre Yves Joly , Panagiota Karanasou , Sri Vishnu Kumar Karlapati , Simon Slangen , Petr Makarov
Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.
-
公开(公告)号:US11574624B1
公开(公告)日:2023-02-07
申请号:US17218466
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Arnaud Vincent Pierre Yves Joly , Panagiota Karanasou , Alexis Pierre Jean-Baptiste Moinet , Thomas Renaud Drugman , Sri Vishnu Kumar Karlapati , Syed Ammar Abbas , Simon Slangen
Abstract: A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.
-
-
-
-
-
-
-