摘要:
An automatic system for temporal alignment between a music audio signal and lyrics is provided. The automatic system can prevent accuracy for temporal alignment from being lowered due to the influence of non-vocal sections. Alignment means of the system is provided with a phone model for singing voice that estimates phonemes corresponding to temporal-alignment features or features available for temporal alignment. The alignment means receives temporal-alignment features outputted from temporal-alignment feature extraction means, information on the vocal and non-vocal sections outputted from vocal section estimation means, and a phoneme network, and performs an alignment operation on condition that no phoneme exists at least in non-vocal sections.
摘要:
A system provided herein may perform automatic temporal alignment between music audio signal and lyrics with higher accuracy than ever. A non-fricative section extracting 4 extracts non-fricative sound sections, where no fricative sounds exist, from the music audio signal. An alignment portion 17 includes a phone model 15 for singing voice capable of estimating phonemes corresponding to temporal-alignment features. The alignment portion 17 performs an alignment operation using as inputs temporal-alignment features obtained from a temporal-alignment feature extracting portion 11, information on vocal and non-vocal sections obtained from a vocal section estimating portion 9, and a phoneme network SN on conditions that no phonemes exist at least in non-vocal sections and that no fricative phonemes exist in non-fricative sound sections.
摘要:
A system provided herein may perform automatic temporal alignment between music audio signal and lyrics with higher accuracy than ever. A non-fricative section extracting 4 extracts non-fricative sound sections, where no fricative sounds exist, from the music audio signal. An alignment portion 17 includes a phone model 15 for singing voice capable of estimating phonemes corresponding to temporal-alignment features. The alignment portion 17 performs an alignment operation using as inputs temporal-alignment features obtained from a temporal-alignment feature extracting portion 11, information on vocal and non-vocal sections obtained from a vocal section estimating portion 9, and a phoneme network SN on conditions that no phonemes exist at least in non-vocal sections and that no fricative phonemes exist in non-fricative sound sections.
摘要:
A music information retrieval system of the present invention can retrieve unknown songs including singing voices having similar voice timbres. Voice timbre features of the songs and identifiers for the respective songs are stored in voice timbre feature storage section 2. When one of the songs is selected, similarity calculation section 3 calculates voice timbre similarities between the selected song and the respective remaining songs, based on voice timbre features of the selected song and the other songs. Similar song retrieval and display section 5 displays on a display 10 a plurality of identifiers for songs which are similar to the selected song in voice timbre. Song data reproduction section 6 reproduces song data corresponding to one or more identifiers selected from among the plurality of identifiers displayed on the display 10.
摘要:
An automatic system for temporal alignment between a music audio signal and lyrics is provided. The automatic system can prevent accuracy for temporal alignment from being lowered due to the influence of non-vocal sections. Alignment means of the system is provided with a phone model for singing voice that estimates phonemes corresponding to temporal-alignment features or features available for temporal alignment. The alignment means receives temporal-alignment features outputted from temporal-alignment feature extraction means, information on the vocal and non-vocal sections outputted from vocal section estimation means, and a phoneme network, and performs an alignment operation on condition that no phoneme exists at least in non-vocal sections.
摘要:
A music information retrieval system of the present invention can retrieve unknown songs including singing voices having similar voice timbres. Voice timbre features of the songs and identifiers for the respective songs are stored in voice timbre feature storage section 2. When one of the songs is selected, similarity calculation section 3 calculates voice timbre similarities between the selected song and the respective remaining songs, based on voice timbre features of the selected song and the other songs. Similar song retrieval and display section 5 displays on a display 10 a plurality of identifiers for songs which are similar to the selected song in voice timbre. Song data reproduction section 6 reproduces song data corresponding to one or more identifiers selected from among the plurality of identifiers displayed on the display 10.