Method and apparatus for improved duration modeling of phonemes

Invention Grant

US06366884B1 Method and apparatus for improved duration modeling of phonemes 有权

Title translation: 用于改善音素持续时间建模的方法和装置

Please log in to see more content

Patent Title: Method and apparatus for improved duration modeling of phonemes
Patent Title (中): 用于改善音素持续时间建模的方法和装置
Application No.: US09436048

Application Date: 1999-11-08
Publication No.: US06366884B1

Publication Date: 2002-04-02
Inventor: Jerome R. Bellegarda , Kim Silverman
Applicant: Jerome R. Bellegarda , Kim Silverman
Main IPC: G10L1300
IPC: G10L1300

Method and apparatus for improved duration modeling of phonemes

Abstract:

A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.

Abstract(Chinese):

提供了一种用于在语音合成系统中改善音素的持续时间建模的方法和装置。根据一个方面，文本被接收到语音合成系统的处理器中。所收到的文本是使用产品总和音程持续时间模型来处理的，该模型用于共振峰方法或语音产生的并置方法。与音素音调模型一起使用的音素持续时间模型通过开发用于广义加性模型的非指数函数变换形式来产生。非指数函数变换形式包括根据最小音素持续时间和最大音素持续时间来控制的根正弦变换。在训练数据中观察到最小和最大音素持续时间。通过指定广义加法模型的多个上下文因素中的至少一个来处理接收到的文本。非指数函数变换的逆向应用于持续时间观察或训练数据。生成与广义加法模型一起使用的系数。包括系数的广义加法模型被应用于接收到的文本的至少一个音素，导致产生具有持续时间的至少一个音素。产生包括表示所接收文本的语音信号的声学序列。

Information query

Espacenet