摘要:
Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
摘要:
In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
摘要:
In an audio processing device, a target sound emphasizer generates a target sound emphasized component by emphasizing a target sound component contained in a plurality of audio signals generated by a plurality of sound receiving devices. A stereo processor generates a stereo component of a plurality of channels from the plurality of audio signals. A first adjuster adjusts a sound pressure level of the target sound emphasized component according to a first adjustment value, and a second adjuster adjusts a sound pressure level of the stereo component according to a second adjustment value. A variable setter variably sets a zoom value which is changeable between a wide angle side and a telephoto side relative to a target. An adjustment controller controls the first adjustment value according to the zoom value such that the sound pressure level of the target sound emphasized component exponentially decreases as the zoom value changes toward the wide-angle side and controls the second adjustment value according to the zoom value such that the sound pressure level of the stereo component increases as the zoom value changes toward the wide-angle side.
摘要:
In an audio signal processing apparatus, a generation section generates an audio signal representing a voice. A distribution section distributes the audio signal generated by the generation section to a first channel and a second channel, respectively. A delay section delays the audio signal of the first channel relative to the audio signal of the second channel for creating a phase difference between the audio signal of the first channel and the audio signal of the second channel such that the created phase difference has a duration corresponding to either an added value of a first duration which is approximately one half of a period of the audio signal generated by the generation section and a second duration which is set shorter than the first duration, or a difference value of the first duration and the second duration. An addition section adds the audio signal of the first channel and the audio signal of the second channel with one another, between which the phase difference is created by the delay section, and outputs the added audio signal which represents natural voice with various characteristics.
摘要:
In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
摘要:
A plurality of voice segments, each including one or more phonemes are acquired in a time-serial manner, in correspondence with desired singing or speaking words. As necessary, a boundary is designated between start and end points of a vowel phoneme included in any one of the acquired voice segments. Voice is synthesized for a region of the vowel phoneme that precedes the designated boundary vowel phoneme, or a region of the vowel phoneme that succeeds the designated boundary in the vowel phoneme. By synthesizing a voice for the region preceding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is uttered by a person and then stopped to sound with his or her mouth kept opened. Further, by synthesizing a voice for the region succeeding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is started to sound with the mouth opened.
摘要:
In an audio signal processing apparatus, a generation section generates an audio signal representing a voice. A distribution section distributes the audio signal generated by the generation section to a first channel and a second channel, respectively. A delay section delays the audio signal of the first channel relative to the audio signal of the second channel for creating a phase difference between the audio signal of the first channel and the audio signal of the second channel such that the created phase difference has a duration corresponding to either an added value of a first duration which is approximately one half of a period of the audio signal generated by the generation section and a second duration which is set shorter than the first duration, or a difference value of the first duration and the second duration. An addition section adds the audio signal of the first channel and the audio signal of the second channel with one another, between which the phase difference is created by the delay section, and outputs the added audio signal which represents natural voice with various characteristics.
摘要:
A method for synthesizing a natural-sounding singing voice divides performance data into a transition part and a long sound part. The transition part is represented by articulation (phonemic chain) data that is read from an articulation template database and is outputted without modification. For the long sound part, a new characteristic parameter is generated by linearly interpolating characteristic parameters of the transition parts positioned before and after the long sound part and adding thereto a changing component of stationary data that is read from a constant part (stationary) template database. An associated apparatus for carrying out the singing voice synthesizing method includes a phoneme database for storing articulation data for the transition part and stationary data for the long sound part, a first device for outputting the articulation data, and a second device for outputting the newly-generated characteristic parameter of the long sound part.
摘要:
A plurality of voice segments, each including one or more phonemes are acquired in a time-serial manner, in correspondence with desired singing or speaking words. As necessary, a boundary is designated between start and end points of a vowel phoneme included in any one of the acquired voice segments. Voice is synthesized for a region of the vowel phoneme that precedes the designated boundary vowel phoneme, or a region of the vowel phoneme that succeeds the designated boundary in the vowel phoneme. By synthesizing a voice for the region preceding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is uttered by a person and then stopped to sound with his or her mouth kept opened. Further, by synthesizing a voice for the region succeeding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is started to sound with the mouth opened.
摘要:
In an apparatus for synthesizing a singing voice of a song, a storage section stores template data in correspondence to various expressions applicable to music notes. The template data includes first and second template data differently defining a temporal variation of a characteristic parameter for applying the corresponding expression to an attack note and a non-attack note, respectively. An input section inputs voice information representing a sequence of vocal elements and specifying expressions in correspondence to the respective vocal elements. A synthesizing section synthesizes the singing voice from the sequence of the vocal elements based on the inputted voice information. When the vocal element is of an attack note, the first template data is applied to the vocal element. Otherwise, when the vocal element is of a non-attack note, the second template data is applied to the vocal element.