摘要:
A second phoneme is generated in consideration of a phonemic context with respect to a first phoneme as a search target. Phonemic piece data corresponding to the second phoneme is searched out from a database. A third phoneme is generated by changing the phonemic context on the basis of the search result, and phonemic piece data corresponding to the third phoneme is re-searched out from the database. The search or re-search result is registered in a table in correspondence with the second or third phoneme.
摘要:
In the concatenative text-to-speech system, high compression rate of duration data in the prosodic template is achieved by extracting statistical parameters describing behavior of actual duration values of instances of each given syllable, phoneme, half-phoneme, diphone, triphone or any other basic speech unit employed, and storing only the extracted statistical parameters, instead of the original duration values. Entries of each given basic unit in the prosodic template is sorted and indexed in the order of increasing duration value. Consequently, the amount of duration data can be significantly reduced, while keeping the error statistically under acceptable range.
摘要:
In one embodiment, the present invention comprises a vocoder having at least one input and at least one output, an encoder comprising a filter having at least one input operably connected to the input of the vocoder and at least one output, a decoder comprising a synthesizer having at least one input operably connected to the at least one output of the encoder, and at least one output operably connected to the at least one output of the vocoder, wherein the encoder comprises a memory and the encoder is adapted to execute instructions stored in the memory comprising classifying speech segments and encoding speech segments, and the decoder comprises a memory and the decoder is adapted to execute instructions stored in the memory comprising time-warping a residual speech signal to an expanded or compressed version of the residual speech signal.
摘要:
The method and preprocessor enhances the intelligibility of narrowband speech without essentially lengthening the overall time duration of the signal. Both spectral enhancements and variable-rate time-scaling procedures are implemented to improve the salience of initial consonants, particularly the perceptually important formant transitions. Emphasis is transferred from the dominating vowel to the preceding consonant through adaptation of the phoneme timing structure. In a further embodiment, the technique is applied as a preprocessor to a speech coder.
摘要:
A plurality of voice segments, each including one or more phonemes are acquired in a time-serial manner, in correspondence with desired singing or speaking words. As necessary, a boundary is designated between start and end points of a vowel phoneme included in any one of the acquired voice segments. Voice is synthesized for a region of the vowel phoneme that precedes the designated boundary vowel phoneme, or a region of the vowel phoneme that succeeds the designated boundary in the vowel phoneme. By synthesizing a voice for the region preceding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is uttered by a person and then stopped to sound with his or her mouth kept opened. Further, by synthesizing a voice for the region succeeding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is started to sound with the mouth opened.
摘要:
Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.
摘要:
A speech segment search unit searches a speech database for speech segments that satisfy a phonetic environment, and a HMM learning unit computes the HMMs of phonemes on the basis of the search result. A segment recognition unit performs segment recognition of speech segments on the basis of the computed HMMs of the phonemes, and when the phoneme of the segment recognition result is equal to a phoneme of the source speech segment, that speech segment is registered in a segment dictionary.
摘要:
The present invention relates to a method for analyzing speech, the method comprising the steps of: a) inputting a speech signal, b) obtaining the first harmonic of the speech signal, c) determining the phase-difference Df between the speech signal and the first harmonic.
摘要:
A rule based speech synthesis apparatus by which concatenation distortion may be less than a preset value without dependency on utterance, wherein a parameter correction unit reads out a target parameter for a vowel from a target parameter storage, responsive to the phoneme at the a leading end and at a trailing end of a speech element and acoustic feature parameters output from a speech element selector, and accordingly corrects the acoustic feature parameters of the speech element. The parameter correction unit corrects the parameters, so that the parameters ahead and behind the speech element are equal to the target parameter for the vowel of the corresponding phoneme, and outputs the so corrected parameters.
摘要:
Portions of spoon waveform are joined by forming extrapolations at the end of one and the beginning of the next portion to create an overlap region with synchronous pitchmarks, and then forming a weighted sum across the overlap to provide a smooth transition.