摘要:
The present invention is provided with: a voice input section (102) that receives a remark (a question) via a voice signal; a reply creation section (110) that creates a voice sequence of a reply (response) to the remark; a pitch analysis section (106) that analyzes the pitch of a first segment (e.g., word ending) of the remark; and a voice generation section (a voice synthesis section (112), etc.) that generates a reply, in the form of voice, represented by the voice sequence. The voice generation section controls the pitch of the entire reply in such a manner that the pitch of a second segment (e.g., word ending) of the reply assumes a predetermined pitch (e.g., five degrees down) with respect to the pitch of the first segment of the remark. Such arrangements can realize synthesis of replying voice capable of giving a natural feel to the user.
摘要:
A voice synthesis method for generating a voice signal through connection of a phonetic piece extracted from a reference voice, includes selecting, by a piece selection unit, the phonetic piece sequentially; setting, by a pitch setting unit, a pitch transition in which a fluctuation of an observed pitch of the phonetic piece is reflected based on a degree corresponding to a difference value between a reference pitch being a reference of sound generation of the reference voice and the observed pitch of the phonetic piece selected by the piece selection unit; and generating, by a voice synthesis unit, the voice signal by adjusting a pitch of the phonetic piece selected by the piece selection unit based on the pitch transition generated by the pitch setting unit.
摘要:
A voice analysis method comprises generating a time series of a relative pitch (R), which is a difference between a pitch (PB) generated from music track data (XB) designating respective notes of a music track in time series, and a pitch (PA) of a reference voice. The music track is divided into unit sections (UA) of a predetermined duration, and singing characteristics data (Z) is generated, which includes, for each of a plurality of statuses (St) of a model (M), classification information for classifying the unit sections (UA) into a plurality of sets and variable information defining a probability distribution of the time series of the relative pitch (R) within each of the classified unit sections (UA). The classification information is generated based on a condition relating to an attribute of the note and based on the condition relating to an attribute of the each of the unit sections (UA).
摘要:
The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.
摘要:
In voice processing, a first distribution generation unit approximates a distribution of feature information representative of voice of a first speaker per a unit interval thereof as a mixed probability distribution which is a mixture of a plurality of first probability distributions corresponding to a plurality of different phones. A second distribution generation unit also approximates a distribution of feature information representative of voice of a second speaker as a mixed probability distribution which is a mixture of a plurality of second probability distributions. A function generation unit generates, for each phone, a conversion function for converting the feature information of voice of the first speaker to that of the second speaker based on respective statistics of the first and second probability distributions that correspond to the phone.
摘要:
A relay device (20) duplicates speech data received from a communication terminal that is engaged in voice communication with another communication terminal. The duplicated speech data is transmitted to and is stored at a media processing device (40). Media processing device 40 builds a database for speech synthesis based on the stored speech data.
摘要:
A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.