Abstract:
For high-accuracy analysis and high-quality synthesis of voice sound (singing and speech), provided herein are a system and a method for estimating from an audio signal spectral envelopes and group delays for sound analysis and synthesis with high accuracy and high temporal resolution. An estimation system of spectral envelopes and group delays includes a fundamental frequency estimation section, an amplitude spectrum acquisition section, a group delay extraction section, a spectral envelope integration section, and a group delay integration section. The spectral envelope integration section sequentially obtains a spectral envelope for sound synthesis by averaging overlapped spectra. The group delay integration section selects from a plurality of group delays a group delay corresponding to the maximum envelope of each frequency component of the spectral envelope and integrates groups delays thus selected to sequentially obtain a group delay for sound synthesis.
Abstract:
A singing synthesis section for generating singing by integrating into one singing a plurality of vocals sung by a singer a plurality of times or vocals of which parts that he/she does not like are sung again. A music audio signal playback section plays back the music audio signal from a signal portion or its immediately preceding signal corresponding to a character in the lyrics when the character displayed on the display screen is selected by a character selecting section. An estimation and analysis data storing section automatically aligns the lyrics with the vocal, decomposes the vocal into three elements, pitch, power, and timber, and stores them. A data selecting section allows the user to select each of the three elements for respective time periods of phonemes. The data editing section modifies the time periods of the three elements in alignment with the modified time periods of the phonemes.
Abstract:
A singing synthesis section for generating singing by integrating into one singing a plurality of vocals sung by a singer a plurality of times or vocals of which parts that he/she does not like are sung again. A music audio signal playback section plays back the music audio signal from a signal portion or its immediately preceding signal corresponding to a character in the lyrics when the character displayed on the display screen is selected by a character selecting section. An estimation and analysis data storing section automatically aligns the lyrics with the vocal, decomposes the vocal into three elements, pitch, power, and timber, and stores them. A data selecting section allows the user to select each of the three elements for respective time periods of phonemes. The data editing section modifies the time periods of the three elements in alignment with the modified time periods of the phonemes.
Abstract:
A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.
Abstract:
A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.
Abstract:
For high-accuracy analysis and high-quality synthesis of voice sound (singing and speech), provided herein are a system and a method for estimating from an audio signal spectral envelopes and group delays for sound analysis and synthesis with high accuracy and high temporal resolution. An estimation system of spectral envelopes and group delays includes a fundamental frequency estimation section, an amplitude spectrum acquisition section, a group delay extraction section, a spectral envelope integration section, and a group delay integration section. The spectral envelope integration section sequentially obtains a spectral envelope for sound synthesis by averaging overlapped spectra. The group delay integration section selects from a plurality of group delays a group delay corresponding to the maximum envelope of each frequency component of the spectral envelope and integrates groups delays thus selected to sequentially obtain a group delay for sound synthesis.