-
公开(公告)号:US10741169B1
公开(公告)日:2020-08-11
申请号:US16141241
申请日:2018-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US20240296827A1
公开(公告)日:2024-09-05
申请号:US18664461
申请日:2024-05-15
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11735162B2
公开(公告)日:2023-08-22
申请号:US17882691
申请日:2022-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US20230113297A1
公开(公告)日:2023-04-13
申请号:US17836330
申请日:2022-06-09
Applicant: Amazon Technologies, Inc.
Inventor: Antonio Bonafonte , Panagiotis Agis Oikonomou Filandras , Bartosz Perz , Arent van Korlaar , Ioannis Douratsos , Jonas Felix Ananda Rohnke , Elena Sokolova , Andrew Paul Breen , Nikhil Sharma
IPC: G10L13/10 , G10L13/047 , G10L15/16 , G10L15/18 , G10L15/22
Abstract: A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.
-
公开(公告)号:US11373633B2
公开(公告)日:2022-06-28
申请号:US16586007
申请日:2019-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek
IPC: G10L13/033 , G10L13/047 , G10L15/18 , G10L13/10 , G06F40/30
Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.
-
公开(公告)号:US20210097976A1
公开(公告)日:2021-04-01
申请号:US16586007
申请日:2019-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek
IPC: G10L13/10 , G10L13/047 , G06F17/27 , G10L13/033
Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.
-
公开(公告)号:US12272350B2
公开(公告)日:2025-04-08
申请号:US18664461
申请日:2024-05-15
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11990118B2
公开(公告)日:2024-05-21
申请号:US18206301
申请日:2023-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11763797B2
公开(公告)日:2023-09-19
申请号:US16908882
申请日:2020-06-23
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra Chicote , Adam Franciszek Nadolski , Thomas Edward Merritt , Bartosz Putrycz , Andrew Paul Breen
IPC: G10L13/10 , G10L13/033 , G10L13/00
CPC classification number: G10L13/033 , G10L13/00 , G10L13/10
Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
-
公开(公告)号:US11367431B2
公开(公告)日:2022-06-21
申请号:US16818542
申请日:2020-03-13
Applicant: Amazon Technologies, Inc.
Inventor: Antonio Bonafonte , Panagiotis Agis Oikonomou Filandras , Bartosz Perz , Arent van Korlaar , Ioannis Douratsos , Jonas Felix Ananda Rohnke , Elena Sokolova , Andrew Paul Breen , Nikhil Sharma
IPC: G10L13/10 , G10L13/047 , G10L15/16 , G10L15/18 , G10L15/22
Abstract: A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.
-
-
-
-
-
-
-
-
-