Method and system for building text-to-speech voice from diverse recordings

Invention Grant

US09542927B2 Method and system for building text-to-speech voice from diverse recordings 有权

Title translation: 从各种录音中构建文字到语音的方法和系统

Please log in to see more content

Patent Title: Method and system for building text-to-speech voice from diverse recordings
Patent Title (中): 从各种录音中构建文字到语音的方法和系统
Application No.: US14540088

Application Date: 2014-11-13
Publication No.: US09542927B2

Publication Date: 2017-01-10
Inventor: Ioannis Agiomyrgiannakis , Alexander Gutkin
Applicant: Google Inc.
Applicant Address: US CA Mountain View
Assignee: Google Inc.
Current Assignee: Google Inc.
Current Assignee Address: US CA Mountain View
Agency: McDonnell Boehnen Hulbert & Berghoff LLP
Main IPC: G10L13/08
IPC: G10L13/08 ; G10L13/02 ; G10L13/06 ; G10L25/03

Method and system for building text-to-speech voice from diverse recordings

Abstract:

A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.

Abstract(Chinese):

公开了一种用于从在不同条件下记录的多个扬声器构建文本到语音（TTS）合成系统的语音数据库的方法和系统。对于参考扬声器的多个话语，可以提取一组参考扬声器向量，并且对于口语扬声器的多个话语中的每一个，可以提取相应的一组口语扬声器向量。在补偿扬声器差异的变换下执行的匹配过程可以用于将每个口语扬声器向量与参考扬声器矢量相匹配。口语扬声器矢量可以用匹配的参考扬声器矢量代替。可以针对每组口语扬声器向量单独执行匹配和替换。然后可以通过聚合所有替换的说话者向量来构建一组有条理的扬声器向量。扬声器矢量的条件集可用于训练TTS系统。

Public/Granted literature

US20160140951A1 Method and System for Building Text-to-Speech Voice from Diverse Recordings Public/Granted day:2016-05-19

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/08	.文本分析或文本以外的语音合成参数的产生，例如语义图翻译为音素、韵律产生、重音或声调测定