摘要:
According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.
摘要:
An apparatus for synthesizing a speech including a waveform memory that stores a plurality of speech unit waveforms, an information memory that correspondingly stores speech unit information and an address of each of the speech unit waveforms, a selector that selects a speech unit sequence corresponding to the input phoneme sequence by referring to the speech unit information, a speech unit waveform acquisition unit that acquires a speech unit waveform corresponding to each speech unit of the speech unit sequence from the waveform memory by referring to the address, a speech unit concatenation unit that generates the speech by concatenating the speech unit waveform acquired.
摘要:
A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.
摘要:
A word dictionary including sets of a character string which constitutes a word, a phoneme sequence which constitutes pronunciation of the word and a part of speech of the word is referenced, an entered text is analyzed, the entered text is divided into one or more subtexts, a phoneme sequence and a part of speech sequence are generated for each subtext, the part of speech sequence of the subtext and a list of part of speech sequence are collated to determine whether the phonetic sound of the subtext is to be converted or not, and the phonetic sounds of the phoneme sequence in the subtext whose phonetic sounds are determined to be converted are converted.
摘要:
According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory.
摘要:
Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.
摘要:
A speech synthesizer includes a periodic component fusing unit and an aperiodic component fusing unit, and fuses periodic components and aperiodic components of a plurality of speech units for each segment, which are selected by a unit selector, by a periodic component fusing unit and an aperiodic component fusing unit, respectively. The speech synthesizer is further provided with an adder, so that the adder adds, edits, and concatenates the periodic components and the aperiodic components of the fused speech units to generate a speech waveform.
摘要:
A word dictionary including sets of a character string which constitutes a word, a phoneme sequence which constitutes pronunciation of the word and a part of speech of the word is referenced, an entered text is analyzed, the entered text is divided into one or more subtexts, a phoneme sequence and a part of speech sequence are generated for each subtext, the part of speech sequence of the subtext and a list of part of speech sequence are collated to determine whether the phonetic sound of the subtext is to be converted or not, and the phonetic sounds of the phoneme sequence in the subtext whose phonetic sounds are determined to be converted are converted.
摘要:
The prosody control unit pattern generation module generates pitch patterns in respective prosody control units based on language attribute information, the phoneme duration and emphasis degree information, the modification method decision module decides a modification method by smoothing processing with respect to the pitch pattern in a connection portion between the prosody control unit and at least one of previous and next prosody control units based on at least emphasis degree information to generate modification method information, and the pattern connection module modifies pitch patterns generated in respective prosody control units by smoothing processing according to the modification method information and connects them to generate a sentence pitch pattern corresponding to a text to be a target for speech synthesis.
摘要:
A speech translation apparatus includes a speech recognition unit configured to recognize input speech of a first language to generate a first text of the first language, an extraction unit configured to compare original prosody information of the input speech with first synthesized prosody information based on the first text to extract paralinguistic information about each of first words of the first text, a machine translation unit configured to translate the first text to a second text of a second language, a mapping unit configured to allocate the paralinguistic information about each of the first words to each of second words of the second text in accordance with synonymity, a generating unit configured to generate second synthesized prosody information based on the paralinguistic information allocated to each of the second words, and a speech synthesis unit configured to synthesize output speech based on the second synthesized prosody information.