摘要:
Herein disclosed a speech synthesis apparatus for and a speech synthesis method of synthesizing a speech in accordance with text data inputted therein to output a speech consisting of recorded speech portions and synthesized speech portions with reverberation properties identical to those of the recorded speech portions in which the synthesized speech portions with reverberation properties is substantially greater in the amplitude than the recorded speech portions to reduce a feeling of strangeness due to the difference in sound quality between the recorded speech portions and the synthesized speech portions.
摘要:
A speech synthesis apparatus (10) comprises speech segment disassembling means (101) for disassembling the speech segments each including at least one phoneme into a plurality of pitch waveforms, phase characteristic transforming means (103) for transforming the phase characteristics of the pitch waveforms into a uniformed phase characteristic, pitch waveform classifying means (104) for classifying the pitch waveforms into a plurality of groups, pitch waveform registering means (106) for registering the pitch waveforms in the database (111) by extracting one pitch waveform from among the pitch waveforms in each of the groups, and synthesizing means (107) for synthesizing the speech with the pitch waveforms registered in the database (111). The speech synthesis apparatus (10) thus constructed can synthesize a natural speech using a relatively small database capacity.
摘要:
A speech synthesizing system using a redundancy-reduced waveform database is disclosed. Each waveform of a sample set of voice segments necessary and sufficient for speech synthesis is divided into pitch waveforms, which are classified into groups of pitch waveforms closely similar to one another. One of the pitch waveforms of each group is selected as a representative of the group and is given a pitch waveform ID. The waveform database at least comprises a pitch waveform pointer table each record of which comprises a voice segment ID of each of the voice segments and pitch waveform IDs the pitch waveforms of which, when combined in the listed order, constitute a waveform identified by the voice segment ID and a pitch waveform table of pitch waveform IDs and corresponding pitch waveforms. This enables the waveform database size to be reduced. For each of pitch waveforms the database lacks, one of the pitch waveform IDs adjacent to the lacking pitch waveform ID in the pitch waveform pointer table is used without deforming the pitch waveform.
摘要:
A composite pitch pattern of an artificial waveform of a composite sound indicating characters is produced according to a general pitch pattern producing model, and a pitch pattern of a VCV phoneme-chain waveform of each of VCV phoneme-chains corresponding to the characters is produced from an actual voice sample. Each VCV phoneme-chain composed of a preceding vowel, a consonant and a succeeding vowel has a pitch fine structure and a pitch fluctuation. Thereafter, an overall inclination of the pitch pattern of each VCV phoneme-chain waveform is adjusted to that of a portion of the composite pitch pattern corresponding to the same VCV phoneme-chain to overlap transitional portions of preceding and succeeding vowels in a changed pitch pattern of each VCV phoneme-chain waveform with those in the corresponding portion of the composite pitch pattern. Therefore, when changed pitch patterns of the VCV phoneme-chain waveforms are connected with each other, a synthesized sound of the characters can be obtained while the synthesized sound maintains a pitch fine structure and a pitch fluctuation.
摘要:
A speech synthesizing apparatus for deforming and connecting speech pieces to synthesize speech has a speech waveform database for storing data of an accent type of a speech piece of a word or a syllable uttered with type-0 accent and type-1 accent, data of phonemic transcription of the speech piece and data of a position at which the speech piece can be segmented, an input buffer for storing a character string of phonemic transcription and prosody of speech to be synthesized, a synthesis unit selecting unit for retrieving candidates of speech pieces from the speech waveform database on the basis of the character string of phonemic transcription in the input buffer, and a used speech piece selecting unit for determining a speech piece to be practically used among the retrieved candidates according to an accent type of speech to be synthesized and a position in the speech at which the speech piece is used, thereby preventing degradation of a quality of sound when the speech piece is processed.
摘要:
Herein disclosed a speech synthesis apparatus for and a speech synthesis method of synthesizing a speech in accordance with text data inputted therein to output a speech consisting of recorded speech portions and synthesized speech portions with reverberation properties identical to those of the recorded speech portions in which the synthesized speech portions with reverberation properties is substantially greater in the amplitude than the recorded speech portions to reduce a feeling of strangeness due to the difference in sound quality between the recorded speech portions and the synthesized speech portions.
摘要:
A speech synthesizing system using a redundancy-reduced waveform database is disclosed. Each waveform of a sample set of voice segments necessary and sufficient for speech synthesis is divided into pitch waveforms, which are classified into groups of pitch waveforms closely similar to one another. One of the pitch waveforms of each group is selected as a representative of the group and is given a pitch waveform ID. The waveform database at least comprises a pitch waveform pointer table each record of which comprises a voice segment ID of each of the voice segments and pitch waveform IDs the pitch waveforms of which, when combined in the listed order, constitute a waveform identified by the voice segment ID and a pitch waveform table of pitch waveform IDs and corresponding pitch waveforms. This enables the waveform database size to be reduced. For each of pitch waveforms the database lacks, one of the pitch waveform IDs adjacent to the lacking pitch waveform ID in the pitch waveform pointer table is used without deforming the pitch waveform.
摘要:
A speech synthesizing apparatus for deforming and connecting speech pieces to synthesize speech has a speech waveform database for storing data of an accent type of a speech piece of a word or a syllable uttered with type-0 accent and type-1 accent, data of phonemic transcription of the speech piece and data of a position at which the speech piece can be segmented, an input buffer for storing a character string of phonemic transcription and prosody of speech to be synthesized, a synthesis unit selecting unit for retrieving candidates of speech pieces from the speech waveform database on the basis of the character string of phonemic transcription in the input buffer, and a used speech piece selecting unit for determining a speech piece to be practically used among the retrieved candidates according to an accent type of speech to be synthesized and a position in the speech at which the speech piece is used, thereby preventing degradation of a quality of sound when the speech piece is processed.
摘要:
A speech synthesis apparatus (10) comprises speech segment disassembling means (101) for disassembling the speech segments each including at least one phoneme into a plurality of pitch waveforms, phase characteristic transforming means (103) for transforming the phase characteristics of the pitch waveforms into a uniformed phase characteristic, pitch waveform classifying means (104) for classifying the pitch waveforms into a plurality of groups, pitch waveform registering means (106) for registering the pitch waveforms in the database (111) by extracting one pitch waveform from among the pitch waveforms in each of the groups, and synthesizing means (107) for synthesizing the speech with the pitch waveforms registered in the database (111). The speech synthesis apparatus (10) thus constructed can synthesize a natural speech using a relatively small database capacity.