Synthesis of speech from text in a voice of a target speaker using neural networks

Invention Grant

US11488575B2 Synthesis of speech from text in a voice of a target speaker using neural networks 有权

Please log in to see more content

Patent Title: Synthesis of speech from text in a voice of a target speaker using neural networks
Application No.: US17055951

Application Date: 2019-05-17
Publication No.: US11488575B2

Publication Date: 2022-11-01
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick Nguyen
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger
International Application: PCT/US2019/032815 WO 20190517
International Announcement: WO2019/222591 WO 20191121
Main IPC: G10L13/04
IPC: G10L13/04 ; G10L17/04 ; G10L19/00 ; G06N3/08 ; G10L13/02

Synthesis of speech from text in a voice of a target speaker using neural networks

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Public/Granted literature

US20210217404A1 Synthesis of Speech from Text in a Voice of a Target Speaker Using Neural Networks Public/Granted day:2021-07-15

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/02	.产生合成语音的方法；语音合成设备
G10L13/04	..语音合成系统的零部件，例如合成设备结构或存储器管理