摘要:
Data in the same range of the fundamental frequency F 0 as speech segments are used as a learning data to prepare a reference codebook CB M for a spectrum envelope. The same learning data for a higher range than F 0 and the same learning data for a lower range are subject to a linear stretch matching with respect to the learning data for the range F 0 . For each vector code in the reference codebook CB M , the spectrum envelope is clustered to prepare a high range codebook CB H and a low range codebook CB L . The spectrum envelope of input speech segments are fuzzy vector quantized (S402) with the reference codebook, and depending on the synthesized F 0 , either one of high, middle and low codebooks is selected. The selected codebook is used to decode the fuzzy vector quantized code, and the decoded output is subject to the inverse FFT. Alternatively, codebooks CM MH and CB ML each comprising differential vectors for corresponding code vectors between CB M and CB H and between CB M and CB L are prepared. The quantized code is decoded using either CB MH or CB ML , and the decoded differential vector is stretched in accordance with a difference in the fundamental frequency between the synthesized speech and the original speech for CB M . The stretched differential vector is added the code vector which was used for the fuzzy vector quantization.
摘要:
Data in the same range of the fundamental frequency F 0 as speech segments are used as a learning data to prepare a reference codebook CB M for a spectrum envelope. The same learning data for a higher range than F 0 and the same learning data for a lower range are subject to a linear stretch matching with respect to the learning data for the range F 0 . For each vector code in the reference codebook CB M , the spectrum envelope is clustered to prepare a high range codebook CB H and a low range codebook CB L . The spectrum envelope of input speech segments are fuzzy vector quantized (S402) with the reference codebook, and depending on the synthesized F 0 , either one of high, middle and low codebooks is selected. The selected codebook is used to decode the fuzzy vector quantized code, and the decoded output is subject to the inverse FFT. Alternatively, codebooks CM MH and CB ML each comprising differential vectors for corresponding code vectors between CB M and CB H and between CB M and CB L are prepared. The quantized code is decoded using either CB MH or CB ML , and the decoded differential vector is stretched in accordance with a difference in the fundamental frequency between the synthesized speech and the original speech for CB M . The stretched differential vector is added the code vector which was used for the fuzzy vector quantization.
摘要:
In a method and apparatus which use actual speech as auxiliary information and synthesize speech by speech synthesis by rule, prosodic information for a phoneme sequence of each word of a word sequence obtained by an analysis of an input text is set by referring to a word dictionary and a speech waveform sequence is obtained from the phoneme sequence of each word by referring to a speech waveform dictionary. On the other hand, prosodic information is extracted from input actual speech and either one of the set prosodic information and the extracted prosodic information is selected and the selected prosodic information is used to control the speech waveform sequence to create synthesized speech.
摘要:
In a method and apparatus which use actual speech as auxiliary information and synthesize speech by speech synthesis by rule, prosodic information for a phoneme sequence of each word of a word sequence obtained by an analysis of an input text is set by referring to a word dictionary and a speech waveform sequence is obtained from the phoneme sequence of each word by referring to a speech waveform dictionary. On the other hand, prosodic information is extracted from input actual speech and either one of the set prosodic information and the extracted prosodic information is selected and the selected prosodic information is used to control the speech waveform sequence to create synthesized speech.