摘要:
A speech recognition system includes: a speed level classifier for measuring a moving speed of a moving object by using a noise signal at an initial time of speech recognition to determine a speed level of the moving object; a first speech enhancement unit for enhancing sound quality of an input speech signal of the speech recognition by using a Wiener filter, if the speed level of the moving object is equal to or lower than a specific level; and a second speech enhancement unit enhancing the sound quality of the input speech signal by using a Gaussian mixture model, if the speed level of the moving object is higher than the specific level. The system further includes an end point detection unit for detecting start and end points, an elimination unit for eliminating sudden noise components based on a sudden noise Gaussian mixture model.
摘要:
A message service method using speech recognition includes a message server recognizing a speech transmitted from a transmission terminal, generating and transmitting a recognition result of the speech and N-best results based on a confusion network to the transmission terminal; if a message is selected through the recognition result and the N-best results and an evaluation result according to accuracy of the message are decided, the transmission terminal transmitting the message and the evaluation result to a reception terminal; and the reception terminal displaying the message and the evaluation result.
摘要:
Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.
摘要:
Provided are a method and apparatus for discriminating a speech signal. The apparatus for discriminating a speech signal includes: an input signal quality improver for reducing additional noise from an acoustic signal received from outside; a first start/end-point detector for receiving the acoustic signal from the input signal quality improver and detecting an end-point of a speech signal included in the acoustic signal; a voiced-speech feature extractor for extracting voiced-speech features of the input signal included in the acoustic signal received from the first start/end-point detector; a voiced-speech/unvoiced-speech discrimination model for storing a voiced-speech model parameter corresponding to a discrimination reference of the voiced-speech feature parameter extracted from the voiced-speech feature extractor; and a voiced-speech/unvoiced-speech discriminator for discriminating a voiced-speech portion using the voiced-speech features extracted by the voiced-speech feature extractor and the voiced-speech discrimination model parameter of the voiced/unvoiced-speech discrimination model.
摘要:
An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
摘要:
A Viterbi decoder includes: an observation vector sequence generator for generating an observation vector sequence by converting an input speech to a sequence of observation vectors; a local optimal state calculator for obtaining a partial state sequence having a maximum similarity up to a current observation vector as an optimal state; an observation probability calculator for obtaining, as a current observation probability, a probability for observing the current observation vector in the optimal state; a buffer for storing therein a specific number of previous observation probabilities; a non-linear filter for calculating a filtered probability by using the previous observation probabilities stored in the buffer and the current observation probability; and a maximum likelihood calculator for calculating a partial maximum likelihood by using the filtered probability. The filtered probability may be a maximum value, a mean value or a median value of the previous observation probabilities and the current observation probability.
摘要:
A speech recognition system includes: a speed level classifier for measuring a moving speed of a moving object by using a noise signal at an initial time of speech recognition to determine a speed level of the moving object; a first speech enhancement unit for enhancing sound quality of an input speech signal of the speech recognition by using a Wiener filter, if the speed level of the moving object is equal to or lower than a specific level; and a second speech enhancement unit enhancing the sound quality of the input speech signal by using a Gaussian mixture model, if the speed level of the moving object is higher than the specific level. The system further includes an end point detection unit for detecting start and end points, an elimination unit for eliminating sudden noise components based on a sudden noise Gaussian mixture model.
摘要:
A noise cancellation apparatus includes a noise estimation module for receiving a noise-containing input speech, and estimating a noise therefrom to output the estimated noise; a first Wiener filter module for receiving the input speech, and applying a first Wiener filter thereto to output a first estimation of clean speech; a database for storing data of a Gaussian mixture model for modeling clean speech; and an MMSE estimation module for receiving the first estimation of clean speech and the data of the Gaussian mixture model to output a second estimation of clean speech. The apparatus further includes a final clean speech estimation module for receiving the second estimation of clean speech from the MMSE estimation module and the estimated noise from the noise estimation module, and obtaining a final Wiener filter gain therefrom to output a final estimation of clean speech by applying the final Wiener filter gain.
摘要:
An apparatus for evaluating the performance of speech recognition includes a speech database for storing N-number of test speech signals for evaluation. A speech recognizer is located in an actual environment and executes the speech recognition of the test speech signals reproduced using a loud speaker from the speech database in the actual environment to produce speech recognition results. A performance evaluation module evaluates the performance of the speech recognition by comparing correct recognition results answers with the speech recognition results.
摘要:
A microphone-array-based speech recognition system using a blind source separation (BBS) and a target speech extraction method in the system are provided. The speech recognition system performs an independent component analysis (ICA) to separate mixed signals input through a plurality of microphone into sound-source signals, extracts one target speech spoken for speech recognition from the separated sound-source signals by using a Gaussian mixture model (GMM) or a hidden Markov Model (HMM), and automatically recognizes a desired speech from the extracted target speech. Accordingly, it is possible to obtain a high speech recognition rate even in a noise environment.