摘要:
The present invention relates to a method for generating a multilingual speech recognizer comprising a multilingual acoustic model, comprising the steps of providing a first speech recognizer comprising a first codebook consisting of first Gaussians and a first Hidden Markov Model, HMM, comprising first states; providing at least one second speech recognizer comprising a second codebook consisting of second Gaussians and a second Hidden Markov Model, HMM, comprising second states; replacing each of the second Gaussians of the at least one second speech recognizer by the respective closest one of the first Gaussians and/or each of the second states of the second HMM of the at least one second speech recognizer with the respective closest state of the first HMM of the first speech recognizer to obtain at least one modified second speech recognizer and combining the first speech recognizer and the at least one modified second speech recognizer to obtain the multilingual speech recognizer.
摘要:
The present invention relates to a method for generating a multilingual codebook, comprising the steps of providing a main language codebook, providing at least one additional codebook corresponding to a language different from the main language and generating a multilingual codebook from the main language codebook and the at least one additional codebook by adding a sub-set of the code vectors of the at least one additional codebook to the main codebook based on distances between the code vectors of the at least one additional codebook to code vectors of the main language codebook.
摘要:
The present invention relates to a method for outputting a synthesized speech signal corresponding to an orthographic string stored in a media file comprising audio data, comprising the steps of analyzing the audio data to determine at least one candidate for a language of the orthographic string, estimating a phonetic representation of the orthographic string based on the determined at least one candidate for a language and synthesizing a speech signal based on the estimated phonetic representation of the orthographic string. The invention also relates to a media player incorporating such a method for a estimating phonetic representation for song and album titles as well as artists' names for speech recognition. Furthermore, the invention relates to the choice of an appropriate speech recognizer for automatically transcribing the lyrics of songs by using audio-based language estimates.
摘要:
The present invention relates to a method for detecting a refrain in an audio file, the audio file comprising vocal components, with the following steps: - generating a phonetic transcription of a major part of the audio file, - analysing the phonetic transcription and identifying a vocal segment in the generated phonetic transcription which is repeated frequently, the identified frequently repeated vocal segment representing the refrain.
Furthermore, it relates to the speech-driven selection based on similarity of detected refrain and user input.
摘要:
The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.
摘要:
The present invention relates to a method for generating a multilingual codebook, comprising the steps of providing a main language codebook, providing at least one additional codebook corresponding to a language different from the main language and generating a multilingual codebook from the main language codebook and the at least one additional codebook by adding a sub-set of the code vectors of the at least one additional codebook to the main codebook based on distances between the code vectors of the at least one additional codebook to code vectors of the main language codebook.
摘要:
The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.