Synthesizing speech from text using neural networks

Invention Grant

US10971170B2 Synthesizing speech from text using neural networks 有权

Please log in to see more content

Patent Title: Synthesizing speech from text using neural networks
Application No.: US16058640

Application Date: 2018-08-08
Publication No.: US10971170B2

Publication Date: 2021-04-06
Inventor: Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Fish & Richardson P.C.
Main IPC: G10L25/30
IPC: G10L25/30 ; G10L13/047 ; G10L13/08 ; G06N7/00 ; G06N3/08 ; G06N3/04 ; G06N5/04 ; G10L25/18

Synthesizing speech from text using neural networks

Abstract:

Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Public/Granted literature

US20200051583A1 SYNTHESIZING SPEECH FROM TEXT USING NEURAL NETWORKS Public/Granted day:2020-02-13

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L25/00	不限于组G10L 15/00-G10L 21/00的语言或者声音分析技术(当利用语音检测器来感知一些信号特殊特征的基于半导体的静噪放大器，如无信号时的感知入H03G3/34)
G10L25/27	.以分析方法为特征的
G10L25/30	..利用神经网络