Abstract:
There is provided a method for use by a speech encoder to encode an input speech signal. The method comprises receiving the input speech signal; determining whether the input speech signal includes an active speech signal or an inactive speech signal; low-pass filtering the inactive speech signal to generate a narrowband inactive speech signal; high-pass filtering the inactive speech signal to generate a high-band inactive speech signal; encoding the narrowband inactive speech signal using a narrowband inactive speech encoder to generate an encoded narrowband inactive speech; generating a low-to-high auxiliary signal by the narrowband inactive speech encoder based on the narrowband inactive speech signal; encoding the high-band inactive speech signal using a wideband inactive speech encoder to generate an encoded wideband inactive speech based on the low-to-high auxiliary signal from the narrowband inactive speech encoder; and transmitting the encoded narrowband inactive speech and the encoded wideband inactive speech.
Abstract:
There is provided a method of reducing effect of noise producing artifacts in a speech signal. The method comprises obtaining (310) a plurality of incoming samples of a speech subframe; summing (310) an energy level for each of the plurality of incoming samples to generate a total input level; comparing (320) the total input level with a predetermined threshold; setting (340) a gain value as a function of the total input level, where the gain value is between zero (0) and one (1), and where the function results in a lower gain value when the total input level is indicative of a silence area than when the total input level is indicative of a non-silence area; and multiplying (350) the plurality of incoming samples of the speech subframe by the gain value.
Abstract:
There is provided a method for use by a speech encoder to encode an input speech signal. The method comprises receiving the input speech signal; determining whether the input speech signal includes an active speech signal or an inactive speech signal; low-pass filtering the inactive speech signal to generate a narrowband inactive speech signal; high-pass filtering the inactive speech signal to generate a high-band inactive speech signal; encoding the narrowband inactive speech signal using a narrowband inactive speech encoder to generate an encoded narrowband inactive speech; generating a low-to-high auxiliary signal by the narrowband inactive speech encoder based on the narrowband inactive speech signal; encoding the high-band inactive speech signal using a wideband inactive speech encoder to generate an encoded wideband inactive speech based on the low-to-high auxiliary signal from the narrowband inactive speech encoder; and transmitting the encoded narrowband inactive speech and the encoded wideband inactive speech.
Abstract:
There is provided a speech encoder for performing an algorithm that comprises obtaining (205) a plurality of open-loop pitch candidates from a current frame of a speech signal, the plurality of open-loop pitch candidates including a first open-loop pitch candidate and a second open-loop pitch candidate; obtaining (205) a voicing information from one or more previous frames; and selecting (280) one of the plurality of open-loop pitch candidates as a final pitch of the current frame using the voicing information from the one or more previous frames. In one aspect, the voicing information from the one or more previous frames includes a previous pitch of the one or more previous frames. In a further aspect, selecting the final pitch of the current frame includes selecting (210) an initial open-loop pitch from that has the maximum long-term correlation value.
Abstract:
There is provided a voice activity detection method for indicating an active voice mode and an inactive voice mode. The method comprises receiving a first portion of an input signal; determining that the first portion of the input signal includes an active voice signal; indicating the active voice mode in response to the determining that the first portion of the input signal includes the active voice signal; receiving a second portion of the input signal immediately following the first portion of the input signal; detepnining that the second portion of the input signal includes an inactive voice signal; extending the indicating the active voice mode for a period of time after determining that the second portion of the input signal includes the inactive voice signal, wherein the period of time varies based on one or more conditions; and indicating the inactive voice mode after expiration of the period of time.
Abstract:
There is provided a method of updating a noise state of a voice activity detection (VAD) for indicating an active voice mode and an inactive voice mode. The method comprises receiving an input signal having a plurality of frames, determining an elapsed time sinc the last update of the noise state, updating the noise state of the VAD if the elapsed time exceeds a predetermined time, determining an average minimum energy based on two or more of the plurality of frames, determining a current minimum energy based on a current frame of the plurality of frames, updating the noise state of the VAD if the average minimum energy is less than the current minimum energy, and updating the noise state of the VAD if the average minimum energy is greater than the current minimum ener plus a first predetermined value (Figure 7).
Abstract:
A method for detecting music in a speech signal having a plurality of frames (120). The method comprises obtaining one or more first pitch correlation candidates from a first frame of the plurality of frames (771); obtaining one or more second pitch correlation candidates from a second from of the plurality of frames (771); selecting a pitch correlation (RP) from the one or more first pitch correlation candidates and one or more second pitch correlation candidates (773); and distinguishing music from background noise based on analyzing the pitch correlation (Rp) (775). The method may comprise filtering the speech signal using a one-order low-pass filter prior to the obtaining the one or more first pitch correlation candidates (920), and down sampling the speech signal by four prior to obtaining the one or more first pitch correlation candidates (940).
Abstract:
A method for detecting music in a speech signal having a plurality of frames. The method comprises defining a music threshold value for a first parameter extracted from a frame of the speech signal, defining a background noise threshold value for the first parameter, and defining an unsure threshold value for the first parameter. The unsure threshold value falls between the music threshold value and the background noise threshold value. If the first parameter falls between the music threshold value and the background noise threshold value, the speech signal is classified as music or background noise based on analyzing a plurality of first parameters extracted from the plurality of frames.
Abstract:
A method is provided for detecting music in a speech signal having a plurality of frames. The method comprises obtaining one or more first pitch correlation candidates from a first frame of the plurality of frames; obtaining one or more second pitch correlation candidates from a second frame of the plurality of frames; selecting a pitch correlation (Rp) from the one or more first pitch correlation candidates and the one or more second pitch correlation candidates; and distinguishing music from background noise based on analyzing the pitch correlation (Rp). The method may further comprise filtering the speech signal using a one-order low-pass filter prior to the obtaining the one or more first pitch correlation candidates, and down sampling the speech signal by four prior to the obtaining the one or more first pitch correlation candidates.
Abstract:
An approach for efficiently reducing background noise from speech signal in real-time applications is presented. A noisy input speech signal is processed through an inverse filter (306) when the spectrum tilt (302) of the input signal is not that of a pure background noise model the noisy input signal is also filtered in order to reduce the spectrum valley areas of the noisy input signal when the background noise is present.