摘要:
A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.
摘要:
Systems and methods for determining similarity between two or more audio pieces are disclosed. An illustrative method for determining musical similarities includes extracting one or more descriptors from each audio piece, generating a vector for each of the audio pieces, extracting one or more audio features from each of the audio pieces, calculating values for each audio feature, calculating a distance between a vector containing the normalized values and the vectors containing the audio pieces, and outputting a response to a user or process indicating the similarity between the audio pieces. The descriptors can be used in performing content-based audio classification and for determining similarities between music. The descriptors that can be extracted from each audio piece can include tonal descriptors, dissonance descriptors, rhythm descriptors, and spatial descriptors.
摘要:
For each of a plurality of music pieces, a storage device stores respective tone data of a plurality of fragments of the music piece and respective musical character values of the fragments. Similarity determination section calculates a similarity index value indicative of a degree of similarity between the character values of each of the fragments of a main music piece and the character values of each individual fragment of a plurality of sub music pieces. Each of the similarity index values calculated for the fragments of each of the sub music pieces can be adjusted in accordance with a user's control. Processing section processes the tone data of each of the fragments of the main music piece on the basis of the tone data of any one of the fragments of the sub music pieces of which the similarity index value indicates sufficient similarity.
摘要:
A singing voice synthesizing apparatus is provided, which enables achievement of a natural sounding synthesized singing voice with a good level of comprehensibility. A phoneme database stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component. A readout device that reads out from the phoneme database the voice fragment data corresponding to inputted lyrics. A duration time adjusting device adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing. An adjusting device adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch. A synthesizing device synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by the duration time adjusting device and the adjusting device.
摘要:
A frequency spectrum is detected by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit formed of a phoneme or a phonemic chain. Local peaks are detected on the frequency spectrum, and spectrum distribution regions including the local peaks are designated. For each spectrum distribution region, amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis and phase spectrum data representing a phase spectrum distribution depending on the frequency axis are generated. The amplitude spectrum data is adjusted to move the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis based on an input note pitch, and the phase spectrum data is adjusted corresponding to the adjustment. Spectrum intensities are adjusted to be along with a spectrum envelope corresponding to a desired tone color. The adjusted amplitude and phase spectrum data are converted into a synthesized voice signal.
摘要:
A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. In the apparatus, a storage section provisionally stores source data, which is associated to and extracted from the target voice. An analyzing section analyzes the input voice to extract therefrom a series of input data frames representing the input voice. A producing section produces a series of target data frames representing the target voice based on the source data, while aligning the target data frames with the input data frames to secure synchronization between the target data frames and the input data frames. A synthesizing section synthesizes the output voice according to the target data frames and the input data frames.