Morphological categories for voice synthesis
    21.
    发明公开
    Morphological categories for voice synthesis 审中-公开
    Morphologische KategorienfürSprachsynthese

    公开(公告)号:EP1160764A1

    公开(公告)日:2001-12-05

    申请号:EP00401560.8

    申请日:2000-06-02

    申请人: Sony France S.A.

    IPC分类号: G10L13/04 G10L13/06

    CPC分类号: G10L13/04 G10L13/07

    摘要: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to coefficients (amplitude spectra and frequency trajectories) for resynthesising the inverse-filtered sounds using an additive sinusoidal technique. The coefficients are derived by STFT analysis.

    摘要翻译: 通过使用源模块中的源声音类别库,在源滤波器类型的语音合成器中获得具有改进表现力的语音合成。 每个源声音类别对应于特定的形态类别,并且是通过反演滤波来分析真实的声音,从而减去声带的影响。 该库可以是参数化的,也就是说,存储的数据不是对应于反滤波的声音本身,而是对应于使用加法正弦波技术重新合成反向滤波的声音的系数(幅度谱和频率轨迹)。 系数通过STFT分析得出。

    Speech synthesizing system and redundancy-reduced waveform database therefor
    22.
    发明公开
    Speech synthesizing system and redundancy-reduced waveform database therefor 失效
    语音合成系统和verringter冗余波形数据库

    公开(公告)号:EP0848372A3

    公开(公告)日:1999-02-17

    申请号:EP97117604.5

    申请日:1997-10-10

    IPC分类号: G10L5/02

    CPC分类号: G10L13/07

    摘要: A speech synthesizing system using a redundancy-reduced waveform database is disclosed. Each waveform of a sample set of voice segments necessary and sufficient for speech synthesis is divided into pitch waveforms, which are classified into groups of pitch waveforms closely similar to one another. One of the pitch waveforms of each group is selected as a representative of the group and is given a pitch waveform ID. The waveform database at least comprises a pitch waveform pointer table each record of which comprises a voice segment ID of each of the voice segments and pitch waveform IDs the pitch waveforms of which, when combined in the listed order, constitute a waveform identified by the voice segment ID and a pitch waveform table of pitch waveform IDs and corresponding pitch waveforms. This enables the waveform database size to be reduced. For each of pitch waveforms the database lacks, one of the pitch waveform IDs adjacent to the lacking pitch waveform ID in the pitch waveform pointer table is used without deforming the pitch waveform.

    AUF MIKROSEGMENTEN BASIERENDES SPRACHSYNTHESEVERFAHREN
    23.
    发明公开
    AUF MIKROSEGMENTEN BASIERENDES SPRACHSYNTHESEVERFAHREN 失效
    微管片基于语音合成方法

    公开(公告)号:EP0886853A1

    公开(公告)日:1998-12-30

    申请号:EP97917259

    申请日:1997-03-08

    IPC分类号: G10L13/04 G10L13/07 G10L5/04

    CPC分类号: G10L13/07 G10L13/04

    摘要: The invention concerns a digital speech-synthesis process whereby utterances in a language are recorded, the recorded utterances are divided into speech segments which are stored so as to allow their allocation to specific phonemes; a text which is to be output as speech is converted to a phoneme chain and the stored segments are output in a sequence defined by the phoneme chain; an analysis of the text to be output as speech is carried out and thus provides information which completes the phoneme chain and modifies the timing sequence signal for the speech segments which are to be strung together for output as speech. The invention is characterised by the use of, as speech segments, microsegments consisting of: segments for vowel halves and semi-vowel halves, vowels standing between consonants being split into two microsegments, a first vowel half beginning shortly before the start of the vowel and extending as far as the vowel middle, and a second vowel half from the vowel middle to just before the vowel end; segments for quasi-stationary vowel components cut from the middle of a vowel; consonant segments beginning shortly before the front phoneme boundary and ending shortly before the rear phoneme boundary; and segments for vowel-vowel sequences cut from the middle of a vowel-vowel transition.

    Method and apparatus for synthesizing speech
    24.
    发明公开
    Method and apparatus for synthesizing speech 失效
    用于合成语音的方法和设备

    公开(公告)号:EP0821344A3

    公开(公告)日:1998-11-18

    申请号:EP97305349.9

    申请日:1997-07-17

    CPC分类号: G10L13/07 G10L13/04

    摘要: A speech synthesizing apparatus for deforming and connecting speech pieces to synthesize speech has a speech waveform database for storing data of an accent type of a speech piece of a word or a syllable uttered with type-0 accent and type-1 accent, data of phonemic transcription of the speech piece and data of a position at which the speech piece can be segmented, an input buffer for storing a character string of phonemic transcription and prosody of speech to be synthesized, a synthesis unit selecting unit for retrieving candidates of speech pieces from the speech waveform database on the basis of the character string of phonemic transcription in the input buffer, and a used speech piece selecting unit for determining a speech piece to be practically used among the retrieved candidates according to an accent type of speech to be synthesized and a position in the speech at which the speech piece is used, thereby preventing degradation of a quality of sound when the speech piece is processed.

    摘要翻译: 一种语音合成设备,用于变形和连接语音片段以合成语音,具有语音波形数据库,用于存储用0型重音和1型重音发音的单词或音节的语音片段的重音类型的数据,音素数据 转录语音片段和可以分割语音片段的位置的数据;输入缓冲器,用于存储音素转录的字符串和待合成的语音的韵律;合成单元选择单元,用于从语音片段中检索语音片段的候选; 基于输入缓冲器中的音素转录的字符串生成语音波形数据库;以及使用语音片段选择单元,用于根据要合成的语音的重音类型确定检索到的候选者中实际使用的语音片段,以及 在语音片段被使用的语音中的位置,从而防止当语音片段是赞成时声音质量的劣化 cessed。

    GENERATING SPEECH FROM DIGITALLY STORED COARTICULATED SPEECH SEGMENTS
    25.
    发明公开
    GENERATING SPEECH FROM DIGITALLY STORED COARTICULATED SPEECH SEGMENTS 失效
    从数字存储的语音部分生成语音

    公开(公告)号:EP0380572A4

    公开(公告)日:1991-04-17

    申请号:EP88909070

    申请日:1988-10-07

    CPC分类号: G10L13/07

    摘要: A system (87) for generating high quality speech uses coarticulated speech segment data extracted from spoken carrier syllables and digitally compressed for storage using adaptive differential pulse code modulation (ADPCM). The system includes a programmed digital microprocessor (89) with an associated read only memory (91) containing the compressed coarticulated speech segment library, random access memory (93) containing system variables and the sequence of coarticulated speech segments required to generate a desired spoken message, and text to speech chip (95) which provides the sequence of coarticulated speech segments to the RAM (93). The microprocessor (89) operates in accordance with a program stored in ROM (91) to recover the compressed coarticulated speech segment data stored in ROM (91) in a sequence called for by the text to speech chip (95), to reconstruct or ''blow back'' the stored ADPCM data to PCM data, and to concatenate the PCM data into waveforms to produce a real time digital speech waveform. The digital speech waveform is converted to an analog signal via digital to analog converter (97), amplified in amplifier (99) and applied to an audio speaker (101) which generates a high quality spoken message. In the preferred embodiment of the invention, the coarticulated speech segments are diphones.

    SYSTEMS AND METHODS FOR GENERATING SPEECH OF MULTIPLE STYLES FROM TEXT
    26.
    发明公开
    SYSTEMS AND METHODS FOR GENERATING SPEECH OF MULTIPLE STYLES FROM TEXT 审中-公开
    系列VERFAHREN ZUR ERZEUGUNG VON SPRACHE MEHRERER STILE VON TEXT

    公开(公告)号:EP3152752A1

    公开(公告)日:2017-04-12

    申请号:EP14894001.8

    申请日:2014-06-05

    IPC分类号: G10L13/02

    摘要: A text-to-speech (TTS) system includes components capable of supporting the generation of speech output in any of multiple styles, and may switch seamlessly from producing speech output in one style to producing speech output in another style. For example, a concatenative TTS system may include a speech base storing speech units associated with multiple speech styles, and a linguistic analysis component to generate a phonetic transcription specifying speech output in any of multiple styles. Text input may include a style indication associated with a particular segment of the input text. The linguistic analysis component may invoke encoded rules and/or components based upon the style indication, and generate a phonetic transcription specifying a speech style, which may be processed to generate output speech.

    摘要翻译: 文本到语音(TTS)系统包括能够支持以多种风格中的任何一种形式产生语音输出的组件,并且可以无缝地切换从一种风格的语音输出生成到以另一种风格产生语音输出。 例如,级联TTS系统可以包括存储与多个语音风格相关联的语音单元的语音库,以及语言分析组件,以产生以多种风格中的任一种形式指定语音输出的语音转录。 文本输入可以包括与输入文本的特定段相关联的风格指示。 语言分析组件可以基于样式指示来调用编码的规则和/或组件,并且生成指定语音样式的语音转录,其可被处理以产生输出语音。

    DECODER AND METHOD FOR A GENERALIZED SPATIAL-AUDIO-OBJECT-CODING PARAMETRIC CONCEPT FOR MULTICHANNEL DOWNMIX/UPMIX CASES
    27.
    发明公开
    DECODER AND METHOD FOR A GENERALIZED SPATIAL-AUDIO-OBJECT-CODING PARAMETRIC CONCEPT FOR MULTICHANNEL DOWNMIX/UPMIX CASES 有权
    参赛作品VERFAHREN ZUR KODIERUNGRÄUMLICHERAUDIOOBJEKTE(G-SAOC)FÜRDIE MULTIKANALMISCHUNG下属

    公开(公告)号:EP2880654A2

    公开(公告)日:2015-06-10

    申请号:EP13759676.3

    申请日:2013-08-05

    IPC分类号: G10L19/008

    摘要: A decoder for generating an audio output signal having one or more audio output channels from a downmix signal having one or more downmix channels is provided. The downmix signal encodes one or more audio object signals. The decoder has a threshold determiner for determining a threshold value depending on a signal energy and/or a noise energy of at least one of the of or more audio object signals and/or depending on a signal energy and/or a noise energy of at least one of the one or more downmix channels. Moreover, the decoder has a processing unit for generating the one or more audio output channels from the one or more downmix channels depending on the threshold value.

    摘要翻译: 提供了一种用于从具有一个或多个下混通道的下混信号产生具有一个或多个音频输出通道的音频输出信号的解码器。 降混信号对一个或多个音频对象信号进行编码。 解码器具有用于根据信号能量和/或至少一个或多个音频对象信号的噪声能量和/或根据信号能量和/或噪声能量来确定阈值的阈值确定器 一个或多个下混通道中的至少一个。 此外,解码器具有处理单元,用于根据阈值从一个或多个下混通道产生一个或多个音频输出通道。

    Voice synthesizing method, voice synthesizing apparatus and computer-readable recording medium
    28.
    发明公开
    Voice synthesizing method, voice synthesizing apparatus and computer-readable recording medium 有权
    Sprachsyntheseverfahren,sprachsynthesevorrichtung und computerlesbares aufzeichnungsmedium

    公开(公告)号:EP2770499A1

    公开(公告)日:2014-08-27

    申请号:EP14155877.5

    申请日:2014-02-20

    发明人: Hisaminato, Yuji

    IPC分类号: G10L13/07 G10H7/00 G10H1/14

    摘要: A voice synthesizing apparatus includes a manipulation determiner configured to determine a manipulation position which is moved according to a manipulation of a user, and a voice synthesizer configured to generate, in response to an instruction to generate a voice in which a second phoneme follows a first phoneme, a voice signal so that vocalization of the first phoneme starts before the manipulation position reaches a reference position and that vocalization from the first phoneme to the second phoneme is made when the manipulation position reaches the reference position.

    摘要翻译: 语音合成装置包括:操作确定器,被配置为根据用户的操纵来确定移动的操作位置;以及语音合成器,被配置为响应于生成其中第二音素跟随第一音素的语音的指令, 音素,声音信号,使得第一音素的声音在操作位置到达参考位置之前开始,并且当操作位置到达参考位置时,从第一音素到第二音素的声音被发出。