Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Vatsal Aggarwal"

1.

发明授权
Synthetic speech processing 有权

公开(公告)号：US11017763B1

公开(公告)日：2021-05-25

申请号：US16712466

申请日：2019-12-12

Applicant: Amazon Technologies, Inc.

Inventor： Vatsal Aggarwal , Nishant Prateek , Roberto Barra Chicote , Andrew Paul Breen

IPC: G10L15/22 , G10L15/26 , G10L13/08 , G10L13/047 , G10L13/033

Abstract: During text-to-speech processing, a sequence-to-sequence neural network model may process text data and determine corresponding spectrogram data. A normalizing flow component may then process this spectrogram data to predict corresponding phase data. An inverse Fourier transform may then be performed on the spectrogram and phase data to create an audio waveform that includes speech corresponding to the text.

2.

发明申请
TEXT-TO-SPEECH PROCESSING USING INPUT VOICE CHARACTERISTIC DATA 有权

公开(公告)号：US20230043916A1

公开(公告)日：2023-02-09

申请号：US17848831

申请日：2022-06-24

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek

IPC: G10L13/10 , G06F40/30 , G10L13/033 , G10L13/047

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

3.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US10692484B1

公开(公告)日：2020-06-23

申请号：US16007757

申请日：2018-06-13

Applicant: Amazon Technologies, Inc.

Inventor： Thomas Edward Merritt , Adam Franciszek Nadolski , Nishant Prateek , Bartosz Putrycz , Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen

IPC: G10L13/04 , G10L13/08 , G10L25/24 , G10L25/60 , G10L13/047

Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.

4.

发明授权
Text-to-speech processing using input voice characteristic data 有权

公开(公告)号：US11373633B2

公开(公告)日：2022-06-28

申请号：US16586007

申请日：2019-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek

IPC: G10L13/033 , G10L13/047 , G10L15/18 , G10L13/10 , G06F40/30

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

5.

发明申请
TEXT-TO-SPEECH PROCESSING 有权

公开(公告)号：US20210097976A1

公开(公告)日：2021-04-01

申请号：US16586007

申请日：2019-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek

IPC: G10L13/10 , G10L13/047 , G06F17/27 , G10L13/033

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

Patent Agency Ranking