摘要:
Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).
摘要:
Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.
摘要:
Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).
摘要:
Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.
摘要:
An audio apparatus including a decorrelator for generating decorrelated signals by applying a phase shifting value adjusted based on a correlation difference between audio signals included in a multi-channel signal to the audio signals; and a speaker set including at least two speakers for outputting acoustic signals corresponding to the decorrelated signals.
摘要:
An apparatus and method for detecting a voice activity period. The apparatus for detecting a voice activity period includes a domain conversion module that converts an input signal into a frequency domain signal in the unit of a frame obtained by dividing the input signal at predetermined intervals, a subtracted-spectrum-generation module that generates a spectral subtraction signal which is obtained by subtracting a predetermined noise spectrum from the converted frequency domain signal, a modeling module that applies the spectral subtraction signal to a predetermined probability distribution model, and a speech-detection module that determines whether a speech signal is present in a current frame through a probability distribution calculated by the modeling module.
摘要:
An apparatus and method for detecting a voice activity period. The apparatus for detecting a voice activity period includes a domain conversion module that converts an input signal into a frequency domain signal in the unit of a frame obtained by dividing the input signal at predetermined intervals, a subtracted-spectrum-generation module that generates a spectral subtraction signal which is obtained by subtracting a predetermined noise spectrum from the converted frequency domain signal, a modeling module that applies the spectral subtraction signal to a predetermined probability distribution model, and a speech-detection module that determines whether a speech signal is present in a current frame through a probability distribution calculated by the modeling module.
摘要:
An apparatus for positioning a screen sound source, a method of generating loudspeaker set information for screen sound source positioning, and a method of reproducing a positioned screen sound source are provided. The apparatus and methods relate to a screen sound source positioning technique. A plurality of loudspeakers, each configured to have approximately the same gain, are each disposed proximate to the edge of a display, and a loudspeaker set including at least two of the loudspeakers is selected to position a virtual sound source substantially synchronized with a visual object displayed at a position on the screen of the display. Accordingly, a virtual sound source may be positioned at a certain specific position on the screen of a display without sound source distortion.
摘要:
An apparatus for positioning a screen sound source, a method of generating loudspeaker set information for screen sound source positioning, and a method of reproducing a positioned screen sound source are provided. The apparatus and methods relate to a screen sound source positioning technique. A plurality of loudspeakers, each configured to have approximately the same gain, are each disposed proximate to the edge of a display, and a loudspeaker set including at least two of the loudspeakers is selected to position a virtual sound source substantially synchronized with a visual object displayed at a position on the screen of the display. Accordingly, a virtual sound source may be positioned at a certain specific position on the screen of a display without sound source distortion.
摘要:
An apparatus for speech recognition includes: a first confidence score calculator calculating a first confidence score using a ratio between a likelihood of a keyword model for feature vectors per frame of a speech signal and a likelihood of a Filler model for the feature vectors; a second confidence score calculator calculating a second confidence score by comparing a Gaussian distribution trace of the keyword model per frame of the speech signal with a Gaussian distribution trace sample of a stored corresponding keyword of the keyword model; and a determination module determining a confidence of a result using the keyword model in accordance with a position determined by the first and second confidence scores on a confidence coordinate system.