摘要:
In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
摘要:
Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
摘要:
In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
摘要:
In an audio signal processing apparatus, a generation section generates an audio signal representing a voice. A distribution section distributes the audio signal generated by the generation section to a first channel and a second channel, respectively. A delay section delays the audio signal of the first channel relative to the audio signal of the second channel for creating a phase difference between the audio signal of the first channel and the audio signal of the second channel such that the created phase difference has a duration corresponding to either an added value of a first duration which is approximately one half of a period of the audio signal generated by the generation section and a second duration which is set shorter than the first duration, or a difference value of the first duration and the second duration. An addition section adds the audio signal of the first channel and the audio signal of the second channel with one another, between which the phase difference is created by the delay section, and outputs the added audio signal which represents natural voice with various characteristics.
摘要:
In an audio signal processing apparatus, a generation section generates an audio signal representing a voice. A distribution section distributes the audio signal generated by the generation section to a first channel and a second channel, respectively. A delay section delays the audio signal of the first channel relative to the audio signal of the second channel for creating a phase difference between the audio signal of the first channel and the audio signal of the second channel such that the created phase difference has a duration corresponding to either an added value of a first duration which is approximately one half of a period of the audio signal generated by the generation section and a second duration which is set shorter than the first duration, or a difference value of the first duration and the second duration. An addition section adds the audio signal of the first channel and the audio signal of the second channel with one another, between which the phase difference is created by the delay section, and outputs the added audio signal which represents natural voice with various characteristics.
摘要:
A method for synthesizing a natural-sounding singing voice divides performance data into a transition part and a long sound part. The transition part is represented by articulation (phonemic chain) data that is read from an articulation template database and is outputted without modification. For the long sound part, a new characteristic parameter is generated by linearly interpolating characteristic parameters of the transition parts positioned before and after the long sound part and adding thereto a changing component of stationary data that is read from a constant part (stationary) template database. An associated apparatus for carrying out the singing voice synthesizing method includes a phoneme database for storing articulation data for the transition part and stationary data for the long sound part, a first device for outputting the articulation data, and a second device for outputting the newly-generated characteristic parameter of the long sound part.
摘要:
Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
摘要:
For at least one music piece, a storage section stores tone data of each of a plurality of fragments segmented from the music piece and stores a first descriptor indicative of a musical character of each of the fragments in association with the fragment. Descriptor generation section receives input data based on operation by a user and generates a second descriptor, indicative of a musical character, on the basis of the received input data. Determination section determines similarity between the second descriptor and the first descriptor of each of the fragments. Selection section selects the tone data of at least one fragment on the basis of a result of the similarity determination by the determination section. On the basis of the tone data of the selected at least one fragment, a data generation section generates tone data to be outputted.
摘要:
Waveform data representative of singing voices of a singing music piece are analyzed to generate melody component data representative of variation over time in fundamental frequency component presumed to represent a melody in the singing voices. Then, through machine learning that uses score data representative of a musical score of the singing music piece and the melody component data, a melody component model, representative of a variation component presumed to represent the melody among the variation over time in fundamental frequency component, is generated for each combination of notes. Parameters defining the melody component models and note identifiers indicative of the combinations of notes whose variation over time in fundamental frequency component are represented by the melody component models are stored into a pitch curve generating database in association with each other.
摘要:
Analysis section divides waveform data of a given music piece into waveform data of a plurality of fragments and divides the waveform data of each of the fragments into one or more events of sound, and obtains a character value indicative of a character of the waveform data pertaining to each of the divided events. Storage section stores respective music piece data and music piece composing data of one or more music pieces. The music piece composing data include a character value indicative of a character of the waveform data pertaining to each of the events of each of the fragments. Search section searches (or retrieves) for, from among the stored music piece composing data, one event or a plurality of successive events having a character value of a high degree of similarity to one or more events included in a designated fragment.