摘要:
An automatic system for temporal alignment between a music audio signal and lyrics is provided. The automatic system can prevent accuracy for temporal alignment from being lowered due to the influence of non-vocal sections. Alignment means of the system is provided with a phone model for singing voice that estimates phonemes corresponding to temporal-alignment features or features available for temporal alignment. The alignment means receives temporal-alignment features outputted from temporal-alignment feature extraction means, information on the vocal and non-vocal sections outputted from vocal section estimation means, and a phoneme network, and performs an alignment operation on condition that no phoneme exists at least in non-vocal sections.
摘要:
An automatic system for temporal alignment between a music audio signal and lyrics is provided. The automatic system can prevent accuracy for temporal alignment from being lowered due to the influence of non-vocal sections. Alignment means of the system is provided with a phone model for singing voice that estimates phonemes corresponding to temporal-alignment features or features available for temporal alignment. The alignment means receives temporal-alignment features outputted from temporal-alignment feature extraction means, information on the vocal and non-vocal sections outputted from vocal section estimation means, and a phoneme network, and performs an alignment operation on condition that no phoneme exists at least in non-vocal sections.
摘要:
An apparatus and method for extracting a predetermined non-harmonic structured spectral component contained in an audio signal. Then, the extracted predetermined spectral component is increased or decreased. In this process, the spectrum of the audio signal is calculated by frequency analysis, so that a spectrum component corresponding to the predetermined non-harmonic structured spectral component is extracted and then increased or decreased. The extraction of the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in advance. In this process, the spectral component of the template is adapted in such a manner that the difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value. This allows the audio-signal contained predetermined non-harmonic structured spectral component to be independently increased or decreased without an influence on other spectral components.
摘要:
A musical piece recommendation system is provided that allows instantaneous registration of a new user and a new musical piece without retraining in a basic training section. A first incremental training section 21 monitors a rating history storage section 3, and each time a change is made to a rating history or a new user is added, performs updating of or addition of the topic selection probability for the user for which the change is made to the rating history or for the new user such that the likelihood determined by a basic training section 17 is kept maximized. A second incremental training section 21 monitors an acoustic feature storage section 5, and each time a new musical piece is added to perform addition to acoustic features, adds the musical piece selection probability related to the added musical piece such that the likelihood determined by the basic training section 17 is kept maximized.
摘要:
An audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds. Each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters such that updated power spectrograms gradually change from a state close to initial power spectrograms to a state close to a plurality of power spectrograms most recently stored in a power spectrogram separation/storage section. Respective sections including the power spectrogram separation/storage section 112 and an updated distribution function computation/storage section 118 repeatedly perform process operations until the updated power spectrograms change from the state close to the initial power spectrograms to the state close to the plurality of power spectrograms most recently stored in the power spectrogram separation/storage section 112. The final updated power spectrograms are close to the power spectrograms of single tones of one musical instrument contained in the input audio signal formed to contain harmonic and inharmonic models.
摘要:
A musical piece recommendation system that allows instantaneous registration of a new user and a new musical piece without retraining in a basic training section. A first incremental training section monitors a rating history storage section, and each time a change is made to a rating history or a new user is added, performs updating of or addition of the topic selection probability for the user for which the change is made to the rating history or for the new user such that the likelihood determined by a basic training section is kept maximized. A second incremental training section monitors an acoustic feature storage section, and each time a new musical piece is added to perform addition to acoustic features, adds the musical piece selection probability related to the added musical piece such that the likelihood determined by the basic training section is kept maximized.
摘要:
An audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds. Each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters such that updated power spectrograms gradually change from a state close to initial power spectrograms to a state close to a plurality of power spectrograms most recently stored in a power spectrogram separation/storage section. Respective sections including the power spectrogram separation/storage section 112 and an updated distribution function computation/storage section 118 repeatedly perform process operations until the updated power spectrograms change from the state close to the initial power spectrograms to the state close to the plurality of power spectrograms most recently stored in the power spectrogram separation/storage section 112. The final updated power spectrograms are close to the power spectrograms of single tones of one musical instrument contained in the input audio signal formed to contain harmonic and inharmonic models.
摘要:
A language understanding device includes: a language understanding model storing unit configured to store word transition data including pre-transition states, input words, predefined outputs corresponding to the input words, word weight information, and post-transition states, and concept weighting data including concepts obtained from language understanding results for at least one word, and concept weight information corresponding to the concepts; a finite state transducer processing unit configured to output understanding result candidates including the predefined outputs, to accumulate word weights so as to obtain a cumulative word weight, and to sequentially perform state transition operations; a concept weighting processing unit configured to accumulate concept weights so as to obtain a cumulative concept weight; and an understanding result determination unit configured to determine an understanding result from the understanding result candidates by referring to the cumulative word weight and the cumulative concept weight.
摘要:
An automatic speech recognition system includes: a sound source localization module for localizing a sound direction of a speaker based on the acoustic signals detected by the plurality of microphones; a sound source separation module for separating a speech signal of the speaker from the acoustic signals according to the sound direction; an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals; an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models, the acoustic model composition module storing the acoustic model in the acoustic model memory; and a speech recognition module which recognizes the features extracted by a feature extractor as character information using the acoustic model composed by the acoustic model composition module.
摘要:
A language understanding device includes: a language understanding model storing unit configured to store word transition data including pre-transition states, input words, predefined outputs corresponding to the input words, word weight information, and post-transition states, and concept weighting data including concepts obtained from language understanding results for at least one word, and concept weight information corresponding to the concepts; a finite state transducer processing unit configured to output understanding result candidates including the predefined outputs, to accumulate word weights so as to obtain a cumulative word weight, and to sequentially perform state transition operations; a concept weighting processing unit configured to accumulate concept weights so as to obtain a cumulative concept weight; and an understanding result determination unit configured to determine an understanding result from the understanding result candidates by referring to the cumulative word weight and the cumulative concept weight.