Abstract:
A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.
Abstract:
The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.
Abstract:
A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.
Abstract:
A device includes: an input speech separation unit which separates an input speech into vocal tract information and voicing source information; a mouth opening degree calculation unit which calculates a mouth opening degree from the vocal tract information; a target vowel database storage unit which stores pieces of vowel information on a target speaker; an agreement degree calculation unit which calculates a degree of agreement between the calculated mouth opening degree and a mouth opening degree included in the vowel information; a target vowel selection unit which selects the vowel information from among the pieces of vowel information, based on the calculated agreement degree; a vowel transformation unit which transforms the vocal tract information on the input speech, using vocal tract information included in the selected vowel information; and a synthesis unit which generates a synthetic speech using the transformed vocal tract information and the voicing source information.
Abstract:
An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.
Abstract:
A personality-based theme may be provided to a device. An application program may query a personality resource file for a prompt corresponding to a personality. Then the prompt may be received at a text to speech synthesis engine. Next, the speech synthesis engine may query a personality voice font and recorded phrases database for a voice font corresponding to the personality and may alter the prompt text to conform with the grammatical style of the personality. Then the speech synthesis engine may apply the voice font to the prompt, which is then produced at an output device.
Abstract:
Herein disclosed is a gaming machine executing a game and paying out a predetermined amount of credits according to a game result; generating voice data based on a player's voice; identifying a voice pattern corresponding to the voice data by retrieving the dialogue voice database and identifying a type of voice corresponding to the voice data, so as to store the voice data along with the voice pattern into the memory; calculating a value indicative of a game result, and updating the play history data stored in the memory using the result of the calculation; comparing the play history data thus updated with a predetermined threshold value data; generating voice data according to the voice pattern based on the play history data if the play history data thus updated exceeds the predetermined threshold value data; and outputting voices from the speaker.
Abstract:
The invention relates to a cellular phone terminal system and in particular to a method for changing caller's voice of speech signal during conversation. The cellular phone terminal system has a filter for filtering signal. The method comprises the steps of: waiting for a caller voice selector key input for a desired caller voice when a caller voice converter key is pressed during conversation; and setting an even or odd harmonic deletion bins on the frequency domain of the uncompressed speech signal correspondingly to the caller voice selector key input to change caller voice.
Abstract:
Envelope identification section generates input envelope data (DEVin) indicative of a spectral envelope (EVin) of an input voice. Template acquisition section reads out, from a storage section, converting spectrum data (DSPt) indicative of a frequency spectrum (SPt) of a converting voice. On the basis of the input envelope data (DEVin) and the converting spectrum data (DSPt), a data generation section specifies a frequency spectrum (SPnew) corresponding in shape to the frequency spectrum (SPt) of the converting voice and having a substantially same spectral envelope as the spectral envelope (EVin) of the input voice, and the data generation section generates new spectrum data (DSPnew) indicative of the frequency spectrum (SPnew). Reverse FFT section and output processing section generates an output voice signal (Snew) on the basis of the new spectrum data (DSPnew).
Abstract:
A telecommunication terminal receives an analog speech signal from a user of the terminal and converts the analog speech signal into a digital signal. A vocoder applies source coding to the speech signal and extracts reconstruction parameters from the speech signal. The reconstruction parameters are modified so that the transmitted voice associated with the signal is modified.