Abstract:
A speech recognition apparatus and method. The speech recognition apparatus includes one or more processors configured to reflect a final recognition result for a previous audio signal in a language model, generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using the language model reflecting the final recognition result for the previous audio signal, and generate a final recognition result for the audio signal in the second linguistic recognition unit based on the first recognition result and the second recognition result. The first linguistic recognition unit may be a same or different linguistic unit type as the second linguistic recognition unit.
Abstract:
A speech recognition apparatus and method. The speech recognition apparatus includes a first recognizer configured to generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, a second recognizer configured to generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using a language model, and a combiner configured to combine the first recognition result and the second recognition result to generate a final recognition result in the second linguistic recognition unit and to reflect the final recognition result in the language model. The first linguistic recognition unit may be a same linguistic unit type as the second linguistic recognition unit. The first recognizer and the second recognizer are configured in a same neural network and simultaneously/collectively trained in the neural network using audio training data provided to the first recognizer.
Abstract:
A speech recognition apparatus includes a processor configured to recognize a user's speech using any one or combination of two or more of an acoustic model, a pronunciation dictionary including primitive words, and a language model including primitive words; and correct word spacing in a result of speech recognition based on a word-spacing model.
Abstract:
A method of extracting a static pattern from an output of an event-based sensor. The method may include receiving an event signal from the event-based sensor in response to dynamic input, and extracting a static pattern associated with the dynamic input based on an identifier and time included in the event signal. The static pattern may be extracted from a map generated based on the identifier and time.
Abstract:
A method and apparatus for speech recognition are disclosed. The speech recognition apparatus includes a processor configured to process a received speech signal, generate a word sequence based on a phoneme sequence generated from the speech signal, generate a syllable sequence corresponding to a word element among words comprised in the word sequence based on the phoneme sequence, and determine a text corresponding to a recognition result of the speech signal based on the word sequence and the syllable sequence.