-
公开(公告)号:US10706837B1
公开(公告)日:2020-07-07
申请号:US16007811
申请日:2018-06-13
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra Chicote , Adam Franciszek Nadolski , Thomas Edward Merritt , Bartosz Putrycz , Andrew Paul Breen
IPC: G10L13/033 , G10L13/04 , G10L13/10
Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
-
公开(公告)号:US09734817B1
公开(公告)日:2017-08-15
申请号:US14221985
申请日:2014-03-21
Applicant: Amazon Technologies, Inc.
Inventor: Bartosz Putrycz
Abstract: To prioritize the processing text-to-speech (TTS) tasks, a TTS system may determine, for each task, an amount of time prior to the task reaching underrun, that is the time before the synthesized speech output to a user catches up to the time since a TTS task was originated. The TTS system may also prioritize tasks to reduce the amount of time between when a user submits a TTS request and when results are delivered to the user. When prioritizing tasks, such as allocating resources to existing tasks or accepting new tasks, the TTS system may prioritize tasks with the lowest amount of time prior to underrun and/or tasks with the longest time prior to delivery of first results.
-
公开(公告)号:US11763797B2
公开(公告)日:2023-09-19
申请号:US16908882
申请日:2020-06-23
Applicant: Amazon Technologies, Inc.
Inventor: Roberto Barra Chicote , Adam Franciszek Nadolski , Thomas Edward Merritt , Bartosz Putrycz , Andrew Paul Breen
IPC: G10L13/10 , G10L13/033 , G10L13/00
CPC classification number: G10L13/033 , G10L13/00 , G10L13/10
Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
-
公开(公告)号:US10546573B1
公开(公告)日:2020-01-28
申请号:US15673838
申请日:2017-08-10
Applicant: Amazon Technologies, Inc.
Inventor: Bartosz Putrycz
Abstract: To prioritize the processing text-to-speech (TTS) tasks, a TTS system may determine, for each task, an amount of time prior to the task reaching underrun, that is the time before the synthesized speech output to a user catches up to the time since a TTS task was originated. The TTS system may also prioritize tasks to reduce the amount of time between when a user submits a TTS request and when results are delivered to the user. When prioritizing tasks, such as allocating resources to existing tasks or accepting new tasks, the TTS system may prioritize tasks with the lowest amount of time prior to underrun and/or tasks with the longest time prior to delivery of first results.
-
公开(公告)号:US10699695B1
公开(公告)日:2020-06-30
申请号:US16023370
申请日:2018-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Adam Franciszek Nadolski , Daniel Korzekwa , Thomas Edward Merritt , Marco Nicolis , Bartosz Putrycz , Roberto Barra Chicote , Rafal Kuklinski , Wiktor Dolecki
IPC: G10L13/10 , G10L13/06 , G10L13/047
Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.
-
公开(公告)号:US10692484B1
公开(公告)日:2020-06-23
申请号:US16007757
申请日:2018-06-13
Applicant: Amazon Technologies, Inc.
Inventor: Thomas Edward Merritt , Adam Franciszek Nadolski , Nishant Prateek , Bartosz Putrycz , Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen
IPC: G10L13/04 , G10L13/08 , G10L25/24 , G10L25/60 , G10L13/047
Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.
-
-
-
-
-