摘要:
In a voice synthesis apparatus, a phoneme piece interpolator part acquires first phoneme piece data of a phoneme piece corresponding to a first value of sound characteristic, and acquires second phoneme piece data of the phoneme piece corresponding to a second value of the sound characteristic. The first phoneme piece data and the second phoneme piece data indicate a spectrum of each frame of the phoneme piece. The phoneme piece interpolator interpolates between each frame of the first phoneme piece data and each frame of the second phoneme piece data corresponding to each frame of the first phoneme piece data so as to create phoneme piece data of the phoneme piece corresponding to a target value of the sound characteristic which is different from the first value and the second value of the sound characteristic. A voice synthesizer generates a voice signal having the target value of the sound characteristic based on the created phoneme piece data.
摘要:
Signal processing section (100) of a terminal converts acquired audio signals of a plurality of channels into frequency spectra set, calculates sound image positions corresponding to individual frequency components, and displays, on a display screen, the calculated sound image positions results by use of a coordinate system having coordinate axes of the frequency components and sound image positions. User-designated partial region of the coordinate system is set as a designated region and an amplitude-level adjusting amount is set for the designated region, so that the signal processing section adjusts amplitude levels of frequency components included in the frequency spectra and in the designated region, converts the adjusted frequency components into audio signals and outputs the converted audio signals.
摘要:
An apparatus is constructed for converting an input voice signal into an output voice signal according to a target voice signal. In the apparatus, an input device provides the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components. An extracting device extracts original attribute data from at least the sinusoidal components of the input voice signal. The original attribute data is characteristic of the input voice signal. A synthesizing device synthesizes new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of target sinusoidal components and target residual components other than the sinusoidal components.; The target attribute data is derived from at least the target sinusoidal components. An output device operates based on the new attribute data and either of the original residual component and the target residual component for producing the output voice signal.
摘要:
In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
摘要:
A sound signal processing apparatus which is capable of correctly detecting expression modes and expression transitions of a song or performance from an input sound signal. A sound signal produced by performance or singing of musical tones is input and divided into frames of predetermined time periods. Characteristic parameters of the input sound signal are detected on a frame-by-frame basis. An expression determining process is carried out in which a plurality of expression modes of a performance or song are modeled as respective states, the probability that a section including a frame or a plurality of continuous frames lies in a specific state is calculated with respect to a predetermined observed section based on the characteristic parameters, and the optimum route of state transition in the predetermined observed section is determined based on the calculated probabilities so as to determine expression modes of the sound signal and lengths thereof.