Abstract:
A noise detection method and apparatus are disclosed. The noise detection method includes: obtaining a frequency-domain energy distribution parameter of a current frame of an audio signal, and obtaining a frequency-domain energy distribution parameter; obtaining a tone parameter of the current frame, and obtaining a tone parameter; determining, according to the tone parameter of the current frame and the tone parameter of each of the frames in the preset neighboring domain range of the current frame, whether the current frame is in a speech section or a non-speech section; and determining that the current frame is speech-grade noise if the current frame is in a speech section and a quantity of frequency-domain energy distribution parameters falling within a preset speech-grade noise frequency-domain energy distribution parameter interval in all the frequency-domain energy distribution parameters is greater than or equal to a first threshold.
Abstract:
A noise detection method and apparatus are disclosed. The noise detection method includes: obtaining a frequency-domain energy distribution parameter of a current frame of an audio signal, and obtaining a frequency-domain energy distribution parameter; obtaining a tone parameter of the current frame, and obtaining a tone parameter; determining, according to the tone parameter of the current frame and the tone parameter of each of the frames in the preset neighboring domain range of the current frame, whether the current frame is in a speech section or a non-speech section; and determining that the current frame is speech-grade noise if the current frame is in a speech section and a quantity of frequency-domain energy distribution parameters falling within a preset speech-grade noise frequency-domain energy distribution parameter interval in all the frequency-domain energy distribution parameters is greater than or equal to a first threshold.
Abstract:
A voice quality monitoring method and apparatus are provided, which solves a difficult problem of how to perform proper voice quality monitoring on a relatively long audio signal by using relatively low costs. The method includes capturing one or more voice signal segments from an input signal; performing voice segment segmentation on each voice signal segment to obtain one or more voice segments; and performing a voice quality evaluation on the voice segment to obtain a quality evaluation result according to the voice quality evaluation. Because the segmented voice segment includes only a voice signal and is shorter than the input signal, proper voice quality monitoring can be performed on a relatively long audio signal by using relatively low costs, thereby obtaining a more accurate voice quality evaluation result.
Abstract:
A method and an apparatus for detecting an audio signal according to frequency domain energy is presented. The method may include receiving an audio signal frame; acquiring frequency domain energy distribution of the audio signal frame; obtaining a maximum value distribution characteristic of a frequency domain energy distribution derivative of the audio signal frame according to the frequency domain energy distribution of the audio signal frame; using the audio signal frame and each frame in a preset neighborhood range of the audio signal frame as a frame set, where the frame set includes a to-be-detected frame; and detecting the to-be-detected frame according to a maximum value distribution characteristic of a frequency domain energy distribution derivative of the frame set. In the various embodiments, detection on an audio signal can be implemented.
Abstract:
A method and an apparatus for detecting an audio signal according to frequency domain energy is presented. The method may include receiving an audio signal frame; acquiring frequency domain energy distribution of the audio signal frame; obtaining a maximum value distribution characteristic of a frequency domain energy distribution derivative of the audio signal frame according to the frequency domain energy distribution of the audio signal frame; using the audio signal frame and each frame in a preset neighborhood range of the audio signal frame as a frame set, where the frame set includes a to-be-detected frame; and detecting the to-be-detected frame according to a maximum value distribution characteristic of a frequency domain energy distribution derivative of the frame set. In the various embodiments, detection on an audio signal can be implemented.
Abstract:
A method and an apparatus for processing a speech signal according to frequency-domain energy where the method and apparatus include receiving an original speech signal including a first speech frame and a second speech frame that are adjacent to each other, performing a Fourier transform on the first speech frame and the second speech frame, obtaining a frequency-domain energy distribution of the first speech frame and the second speech frame, obtaining a frequency-domain energy correlation coefficient, and segmenting the original speech signal according to the frequency-domain energy correlation coefficient. Hence a problem that a speech signal segmentation result has low accuracy due to a characteristic of a phoneme of a speech signal or severe impact of noise when refined speech signal segmentation is performed may be resolved.
Abstract:
The invention discloses a method including: performing in a unit of first timeframe frame length, framing on a continuous voice sample to obtain a plurality of first timeframes, detecting energy of each of the first timeframes, and determining a target first timeframe including a potential abrupt exception of a voice signal by analyzing a relationship between the energy of the plurality of first timeframes; performing, in a unit of second timeframe frame length, framing on the continuous voice sample to obtain a plurality of second timeframes, and processing each of the second timeframes to acquire a tone feature, and determining, by analyzing a tone feature of at least one of the second timeframes including at least one target second timeframe, whether the potential abrupt exception of a voice signal included in the target first timeframe included in the target second timeframe is a real abrupt exception of a voice signal.