Abstract:
A low-power voice command detection method uses an audio monitoring device to capture sound. The captured sound is analyzed in steps to determine if it fulfills a number of criteria regarding sound level, voice content and identifiable voice commands. For each step the processing is more complex and power demanding. A threshold between the first and subsequent steps is used to gate further processing. This threshold is dynamically adjusted, based on the outcome of the analysis, to avoid unnecessary processing and increase system performance.
Abstract:
A method and apparatus for adaptively detecting a voice activity in an input audio signal is provided. The method comprises the steps of: determining a noise characteristic (nc) of the input audio signal based at least on a received input frame of the input audio signal; deriving a voice activity detection (VAD) parameter (vp) adapted to the noise characteristic of the input audio signal; comparing the derived VAD parameter with a threshold to provide a voice activity detection decision.
Abstract:
The embodiments of the present invention relates to a primary voice activity detector and a method thereof. By using the method of the embodiments it is possible to determine whether frames of an input signal comprise voice. That is achieved by receiving a frame of the input signal, determining a first SNR of the received frame, comparing the determined first SNR with an adaptive threshold, and detecting whether the received frame comprises voice based on said comparison. The adaptive threshold is at least based on total noise energy of a noise level, an estimate of a second SNR and on energy variation between different frames.
Abstract:
There is provided a voice activity detection method for indicating an active voice mode and an inactive voice mode. The method comprises receiving a first portion of an input signal; determining that the first portion of the input signal includes an active voice signal; indicating the active voice mode in response to the determining that the first portion of the input signal includes the active voice signal; receiving a second portion of the input signal immediately following the first portion of the input signal; detepnining that the second portion of the input signal includes an inactive voice signal; extending the indicating the active voice mode for a period of time after determining that the second portion of the input signal includes the inactive voice signal, wherein the period of time varies based on one or more conditions; and indicating the inactive voice mode after expiration of the period of time.
Abstract:
The present invention relates to a method and apparatus for detecting voice activity in a communication signal, wherein filter means are provided for estimating or suppressing an offset component of the level of the communication signal. A filter parameter is controlled based on the output of the filter means. Furthermore, the estimation or suppression of the offset component is limited in response to the output of the filter means. The filter means may be based on a non-linear adaptive notch level filter or a noise floor tracking filter. Thereby, the tracking behavior of noise floor estimation to sudden rises in noise floor can be improved and the voice activity detection can work efficiently over a wide dynamic range.
Abstract:
The system and method of the invention relates to voice detection technology for determining instants of time at which a snapshot of noise characteristics results in improved adaptation of noise floors used in voice detection. The approach is based on the "lower envelope" of the smoothed input signal power. Incorporation of this approach in a simple time domain VAD (Voice Activity Detector) results in an effective low-complexity system which, on the basis of simulations, gives good performance down to SNR values of about 0dB. In the invention the lower envelope also provides the updated value of the noise threshold during the presence of speech. The invention can also be embedded in other, more complex (e.g., frequency domain) VADs at low computational cost.
Abstract:
A waveform-based technique for generating periodicity information from an input signal includes generating a pre-processed signal by applying low pass and non-linear filtering to the input signal, wherein the pre-processed signal has highlighted speech pitch tracks. An adaptive threshold algorithm is applied to the pre-processed signal to generate a detection signal having waveform segments whose peaks are separated by a pitch period of the input signal. A period between peaks in the detection signal is determined that indicates the periodicity information. Information about the period between the peaks in the detection signal is then used to adapt a scaling value to be used by the adaptive threshold algorithm in a subsequent step. The periodicity information may be utilized in a voice activity detector in a telephonic communications system.
Abstract:
An improved noise suppression system (800) is disclosed which performs speech quantity enhancement upon the speech-plus-noise signal available at the input (205) to generate a clean speech signal at the output (265) by spectral gain modification. The improvements of the present invention include the addition of a signal-to-noise ratio (SNR) threshold mechanism (830) to reduce background noise flutter by offsetting the gain rise of the gain tables until a certain SNR threshold is reached, the use of a voice metric calculator (810) to produce more accurate background noise estimates via performing the update decision based on the overall voice-like characteristics in the channels and the time interval since the last update, and the use of a channel SNR modifier (820) to provide immunity to narrowband noise bursts through modification of the SNR estimates based on the voice metric calculation and the channel energies.
Abstract:
An embodiment of a speech endpoint detector apparatus may include a speech detector to detect a presence of speech in an electronic speech signal, a pause duration measurer communicatively coupled to the speech detector to measure a duration of a pause following a period of detected speech, an end of utterance detector communicatively coupled to the pause duration measurer to detect if the pause measured following the period of detected speech is greater than a pause threshold corresponding to an end of an utterance, and a pause threshold adjuster to adaptively adjust the pause threshold corresponding to an end of an utterance based on stored pause information. Other embodiments are disclosed and claimed.