Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Thomas Edward Merritt"

11.

发明授权
User-customized synthetic voice 有权

公开(公告)号：US12087270B1

公开(公告)日：2024-09-10

申请号：US17955961

申请日：2022-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Sebastian Dariusz Cygert , Daniel Korzekwa , Kamil Pokora , Piotr Tadeusz Bilinski , Kayoko Yanagisawa , Abdelhamid Ezzerg , Thomas Edward Merritt , Raghu Ram Sreepada Srinivas , Nikhil Sharma

IPC: G10L15/16 , G10L13/033 , G10L13/047 , G10L13/10 , G10L15/06 , G10L25/30

CPC classification number: G10L13/033 , G10L13/047 , G10L13/10

Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.

12.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US10706837B1

公开(公告)日：2020-07-07

申请号：US16007811

申请日：2018-06-13

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Adam Franciszek Nadolski , Thomas Edward Merritt , Bartosz Putrycz , Andrew Paul Breen

IPC: G10L13/033 , G10L13/04 , G10L13/10

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

13.

发明授权
Contextual text-to-speech processing 有权

公开(公告)号：US10475438B1

公开(公告)日：2019-11-12

申请号：US15447919

申请日：2017-03-02

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Javier Latorre , Adam Franciszek Nadolski , Viacheslav Klimkov , Thomas Edward Merritt

IPC: G10L13/10 , G10L13/033 , G10L13/047

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

14.

发明申请
USER-CUSTOMIZED SYNTHETIC VOICE 有权

公开(公告)号：US20240428775A1

公开(公告)日：2024-12-26

申请号：US18823176

申请日：2024-09-03

Applicant: Amazon Technologies, Inc.

Inventor： Sebastian Dariusz Cygert , Daniel Korzekwa , Kamil Pokora , Piotr Tadeusz Bilinski , Kayoko Yanagisawa , Abdelhamid Ezzerg , Thomas Edward Merritt , Raghu Ram Sreepada Srinivas , Nikhil Sharma

IPC: G10L13/033 , G10L13/047 , G10L13/10

Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.

15.

发明公开
TEXT-TO-SPEECH (TTS) PROCESSING 审中-公开

公开(公告)号：US20240013770A1

公开(公告)日：2024-01-11

申请号：US18206301

申请日：2023-06-06

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/047

CPC classification number: G10L13/047

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

16.

发明申请
TEXT-TO-SPEECH (TTS) PROCESSING 有权

公开(公告)号：US20230058658A1

公开(公告)日：2023-02-23

申请号：US17882691

申请日：2022-08-08

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/10 , G10L25/18 , G10L13/06

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

17.

发明授权
Contextual text-to-speech processing 有权

公开(公告)号：US11443733B2

公开(公告)日：2022-09-13

申请号：US16665886

申请日：2019-10-28

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Javier Latorre , Adam Franciszek Nadolski , Viacheslav Klimkov , Thomas Edward Merritt

IPC: G10L13/10 , G10L13/033 , G10L13/047

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

18.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US11410639B2

公开(公告)日：2022-08-09

申请号：US16922590

申请日：2020-07-07

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/10 , G10L25/18 , G10L13/06

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

19.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US10741169B1

公开(公告)日：2020-08-11

申请号：US16141241

申请日：2018-09-25

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/08 , G10L13/10 , G10L25/18 , G10L13/06

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification