摘要:
An audio apparatus including a decorrelator for generating decorrelated signals by applying a phase shifting value adjusted based on a correlation difference between audio signals included in a multi-channel signal to the audio signals; and a speaker set including at least two speakers for outputting acoustic signals corresponding to the decorrelated signals.
摘要:
Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).
摘要:
Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).
摘要:
Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.
摘要:
Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.
摘要:
An apparatus for positioning a screen sound source, a method of generating loudspeaker set information for screen sound source positioning, and a method of reproducing a positioned screen sound source are provided. The apparatus and methods relate to a screen sound source positioning technique. A plurality of loudspeakers, each configured to have approximately the same gain, are each disposed proximate to the edge of a display, and a loudspeaker set including at least two of the loudspeakers is selected to position a virtual sound source substantially synchronized with a visual object displayed at a position on the screen of the display. Accordingly, a virtual sound source may be positioned at a certain specific position on the screen of a display without sound source distortion.
摘要:
An apparatus for positioning a screen sound source, a method of generating loudspeaker set information for screen sound source positioning, and a method of reproducing a positioned screen sound source are provided. The apparatus and methods relate to a screen sound source positioning technique. A plurality of loudspeakers, each configured to have approximately the same gain, are each disposed proximate to the edge of a display, and a loudspeaker set including at least two of the loudspeakers is selected to position a virtual sound source substantially synchronized with a visual object displayed at a position on the screen of the display. Accordingly, a virtual sound source may be positioned at a certain specific position on the screen of a display without sound source distortion.
摘要:
An apparatus for speech recognition includes: a first confidence score calculator calculating a first confidence score using a ratio between a likelihood of a keyword model for feature vectors per frame of a speech signal and a likelihood of a Filler model for the feature vectors; a second confidence score calculator calculating a second confidence score by comparing a Gaussian distribution trace of the keyword model per frame of the speech signal with a Gaussian distribution trace sample of a stored corresponding keyword of the keyword model; and a determination module determining a confidence of a result using the keyword model in accordance with a position determined by the first and second confidence scores on a confidence coordinate system.
摘要:
A virtual screen sound source is spatially synchronized with a visual object displayed on a display. A plurality of loudspeaker sets, which each include at least three of a plurality of loudspeakers installed at the periphery of a display, are selected, individual sound sources corresponding to the respective selected loudspeaker sets are generated, and a multi-sound source is generated by overlapping the generated individual sound sources and output through loudspeakers included in the loudspeaker sets.
摘要:
An apparatus for speech recognition includes: a first confidence score calculator calculating a first confidence score using a ratio between a likelihood of a keyword model for feature vectors per frame of a speech signal and a likelihood of a Filler model for the feature vectors; a second confidence score calculator calculating a second confidence score by comparing a Gaussian distribution trace of the keyword model per frame of the speech signal with a Gaussian distribution trace sample of a stored corresponding keyword of the keyword model; and a determination module determining a confidence of a result using the keyword model in accordance with a position determined by the first and second confidence scores on a confidence coordinate system.