Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Roberto Barra Chicote"

1.

发明授权
Contextual text-to-speech processing 有权

公开(公告)号：US11443733B2

公开(公告)日：2022-09-13

申请号：US16665886

申请日：2019-10-28

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Javier Latorre , Adam Franciszek Nadolski , Viacheslav Klimkov , Thomas Edward Merritt

IPC: G10L13/10 , G10L13/033 , G10L13/047

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

2.

发明授权
Text-to-speech processing using input voice characteristic data 有权

公开(公告)号：US11373633B2

公开(公告)日：2022-06-28

申请号：US16586007

申请日：2019-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek

IPC: G10L13/033 , G10L13/047 , G10L15/18 , G10L13/10 , G06F40/30

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

3.

发明申请
TEXT-TO-SPEECH PROCESSING 有权

公开(公告)号：US20210097976A1

公开(公告)日：2021-04-01

申请号：US16586007

申请日：2019-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek

IPC: G10L13/10 , G10L13/047 , G06F17/27 , G10L13/033

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

4.

发明申请
CONTEXTUAL TEXT-TO-SPEECH PROCESSING 审中-公开

公开(公告)号：US20200152169A1

公开(公告)日：2020-05-14

申请号：US16665886

申请日：2019-10-28

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Javier Latorre , Adam Franciszek Nadolski , Viacheslav Klimkov , Thomas Edward Merritt

IPC: G10L13/10 , G10L13/033 , G10L13/047

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

5.

发明授权
Detecting machine-outputted audio 有权

公开(公告)号：US11955122B1

公开(公告)日：2024-04-09

申请号：US17487434

申请日：2021-09-28

Applicant: Amazon Technologies, Inc.

Inventor： Mansour Ahmadi , Udhgee Murugesan , Roger Hau-Bin Cheng , Roberto Barra Chicote , Kian Jamali Abianeh , Yixiong Meng , Oguz Hasan Elibol , Itay Teller , Kevin Kwanghoon Ha , Andrew Roths

IPC: G10L15/22 , G06N3/044 , G10L15/02 , G10L15/16 , G10L15/18 , G10L25/21 , G10L25/30 , G10L25/69 , G10L15/08

CPC classification number: G10L15/22 , G06N3/044 , G10L15/02 , G10L15/16 , G10L15/18 , G10L25/21 , G10L25/30 , G10L25/69 , G10L2015/088

Abstract: Techniques for determining whether audio is machine-outputted or non-machine-outputted are described. A device may receive audio, may process the audio to determine audio data including audio features corresponding to the audio, and may process the audio data to determine audio embedding data. The device may process the audio embedding data to determine whether the audio is machine-outputted or non-machine-outputted. In response to determining that the audio is machine-outputted, then the audio may be discarded or not processed further. Alternatively, in response to determining that the audio is non-machine-outputted (e.g., live speech from a user), then the audio may be processed further (e.g., using ASR processing).

6.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US11763797B2

公开(公告)日：2023-09-19

申请号：US16908882

申请日：2020-06-23

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Adam Franciszek Nadolski , Thomas Edward Merritt , Bartosz Putrycz , Andrew Paul Breen

IPC: G10L13/10 , G10L13/033 , G10L13/00

CPC classification number: G10L13/033 , G10L13/00 , G10L13/10

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

7.

发明授权
Synthetic speech processing 有权

公开(公告)号：US11580955B1

公开(公告)日：2023-02-14

申请号：US17218740

申请日：2021-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Yixiong Meng , Roberto Barra Chicote , Grzegorz Beringer , Zeya Chen , Jie Liang , James Garnet Droppo , Chia-Hao Chang , Oguz Hasan Elibol

IPC: G10L13/08 , G10L13/027 , G10L15/06 , G10L13/033 , G10L19/008 , G10L13/047

Abstract: A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.

8.

发明授权
Low latency audio interface 有权

公开(公告)号：US10079021B1

公开(公告)日：2018-09-18

申请号：US14974872

申请日：2015-12-18

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Adam Franciszek Nadolski

IPC: G10L15/22 , G10L15/30 , G10L13/04 , G10L15/32 , G10L13/027 , G10L15/18

CPC classification number: G10L15/30 , G10L13/00 , G10L13/027 , G10L13/04 , G10L15/18 , G10L15/22 , G10L15/32

Abstract: Systems and methods for utilizing incremental processing of portions of output data to limit the time required to provide a response to a user request are provided herein. In some embodiments, portions of the user request for information can be analyzed using techniques such as automatic speech recognition (ASR), speech-to-text (STT), and natural language understanding (NLU) to determine the overall topic of the user request. One the topic has been determined, portions of the anticipated audio output data can be synthesized independently instead of waiting for the complete response. The synthesized portions can then be provided to the electronic device in anticipation of being output through one or more speakers on the electronic device, which speeds up the time that the response can be provided to the user.

9.

发明申请
VOICE CUSTOMIZATION FOR SYNTHETIC SPEECH GENERATION 有权

公开(公告)号：US20250014567A1

公开(公告)日：2025-01-09

申请号：US18887462

申请日：2024-09-17

Applicant: Amazon Technologies, Inc.

Inventor： Abdelhamid Ezzerg , Piotr Tadeusz Bilinski , Thomas Edward Merritt , Roberto Barra Chicote , Daniel Korzekwa , Kamil Pokora

IPC: G10L13/047 , G06N3/045 , G10L25/30

Abstract: Voice customization is an application of voice synthesis that involves synthesizing speech having certain voice characteristics, and/or modifying the voice characteristics of human speech. Certain techniques for voice customization may be used in conjunction with compressing speech for storage and/or transmission. For example, speech may be received at a first device and transformed into a latent representation and/or compressed for storage and/or transmission to a second device. The system may use normalizing flows to transform the source audio to a latent representation having a desired variable distribution, and to transform the latent representation back into audio data. A flow model may be conditioned using first speech attributes when transforming the source audio, and an inverse flow model may use second speech attributes when transforming the latent representation back into audio data. The first and/or second speech attributes may be modified to alter voice characteristics of the transmitted speech.

10.

发明申请
TEXT-TO-SPEECH PROCESSING USING INPUT VOICE CHARACTERISTIC DATA 有权

公开(公告)号：US20230043916A1

公开(公告)日：2023-02-09

申请号：US17848831

申请日：2022-06-24

Applicant: Amazon Technologies, Inc.

Inventor： Roberto Barra Chicote , Vatsal Aggarwal , Andrew Paul Breen , Javier Gonzalez Hernandez , Nishant Prateek

IPC: G10L13/10 , G06F40/30 , G10L13/033 , G10L13/047

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification