摘要:
A second phoneme is generated in consideration of a phonemic context with respect to a first phoneme as a search target. Phonemic piece data corresponding to the second phoneme is searched out from a database. A third phoneme is generated by changing the phonemic context on the basis of the search result, and phonemic piece data corresponding to the third phoneme is re-searched out from the database. The search or re-search result is registered in a table in correspondence with the second or third phoneme.
摘要:
This invention relates to the generation of synthetic speech from conventional texts and in particular to the step in which a text in graphemes is converted into a text in phonemes. The grapheme text is analysed into rimes and onsets and each word is analysed from the end so that earlier occurring segments are at least partially defined by the identification of later occurring segments. It is a particular feature that an internal string of consonants, i.e. a string of consonants preceded and followed by a vowel, is split into two portions, namely a second portion which is contained in a database of onsets and an earlier portion which, together with the proceeding vowel or vowels, is contained in a database of rimes.
摘要:
Data in the same range of the fundamental frequency F 0 as speech segments are used as a learning data to prepare a reference codebook CB M for a spectrum envelope. The same learning data for a higher range than F 0 and the same learning data for a lower range are subject to a linear stretch matching with respect to the learning data for the range F 0 . For each vector code in the reference codebook CB M , the spectrum envelope is clustered to prepare a high range codebook CB H and a low range codebook CB L . The spectrum envelope of input speech segments are fuzzy vector quantized (S402) with the reference codebook, and depending on the synthesized F 0 , either one of high, middle and low codebooks is selected. The selected codebook is used to decode the fuzzy vector quantized code, and the decoded output is subject to the inverse FFT. Alternatively, codebooks CM MH and CB ML each comprising differential vectors for corresponding code vectors between CB M and CB H and between CB M and CB L are prepared. The quantized code is decoded using either CB MH or CB ML , and the decoded differential vector is stretched in accordance with a difference in the fundamental frequency between the synthesized speech and the original speech for CB M . The stretched differential vector is added the code vector which was used for the fuzzy vector quantization.
摘要:
In a method and apparatus which use actual speech as auxiliary information and synthesize speech by speech synthesis by rule, prosodic information for a phoneme sequence of each word of a word sequence obtained by an analysis of an input text is set by referring to a word dictionary and a speech waveform sequence is obtained from the phoneme sequence of each word by referring to a speech waveform dictionary. On the other hand, prosodic information is extracted from input actual speech and either one of the set prosodic information and the extracted prosodic information is selected and the selected prosodic information is used to control the speech waveform sequence to create synthesized speech.
摘要:
The present invention relates to a device and method at speech synthesis. A speech is registered and polyphones are stored. In connection with registration of the polyphones also the movement pattern in a face is registered. The registration of the movement pattern in the face is made by that a number of measuring points in the face are registered at the same time as the polyphones are registered. In connection with translation of a person's speech from one language into another, the polyphones and corresponding movement patterns in the face are linked up to a movement model in the face. The face of the real person is after that pasted over the model, at which one to the language corresponding movement pattern is obtained. The invention consequently gives the impression that the person really speaks the language in question.
摘要:
In a speech synthesis apparatus for outputting synthesized speech on the basis of a parameter sequence of a speech waveform, a parameter generation unit generates a parameter sequence for speech synthesis on the basis of a character sequence input by a character sequence input unit, and stores the generated parameter sequence in a parameter storage unit. A waveform generation unit generates pitch waveforms each for one pitch period on the basis of synthesis parameters and pitch scales included in the parameter sequence, and generates a speech waveform by connecting the generated pitch waveforms in accordance with frame lengths set by a frame length setting unit.
摘要:
A composite pitch pattern of an artificial waveform of a composite sound indicating characters is produced according to a general pitch pattern producing model, and a pitch pattern of a VCV phoneme-chain waveform of each of VCV phoneme-chains corresponding to the characters is produced from an actual voice sample. Each VCV phoneme-chain composed of a preceding vowel, a consonant and a succeeding vowel has a pitch fine structure and a pitch fluctuation. Thereafter, an overall inclination of the pitch pattern of each VCV phoneme-chain waveform is adjusted to that of a portion of the composite pitch pattern corresponding to the same VCV phoneme-chain to overlap transitional portions of preceding and succeeding vowels in a changed pitch pattern of each VCV phoneme-chain waveform with those in the corresponding portion of the composite pitch pattern. Therefore, when changed pitch patterns of the VCV phoneme-chain waveforms are connected with each other, a synthesized sound of the characters can be obtained while the synthesized sound maintains a pitch fine structure and a pitch fluctuation.
摘要:
Bei einem Verfahren und einer Einrichtung zur Ausgabe von digital codierten Verkehrsmeldungen mittels synthetisch erzeugter Sprache, wobei die Verkehrsmeldungen Ortscodes und Ereigniscodes enthalten, ist vorgesehen, daß in Abhängigkeit von der jeweils empfangenen Verkehrsmeldung aus mehreren gespeicherten Trägersätzen ein Trägersatz ausgewählt wird und daß in offene Positionen des Trägersatzes ortsbezogene und ereignisbezogene Wörter eingesetzt werden, die von den Ortscodes und Ereigniscodes abgeleitet werden.