Automatic speech recognition system addressing perceptual-based adversarial audio attacks
摘要:
A computer-implemented method for creating a combined audio signal in a speech recognition system, the method includes sampling the audio input signal to generate a time-domain sampled input signal, then converting the time-domain sampled input signal to a frequency-domain input signal, afterwards generating perceptual weights in response to frequency components of critical bands of the frequency-domain input signal, creating a time-domain adversary signal in response to the perceptual weights; and combining the time-domain adversary signal with the audio input signal to create a combined audio signal, wherein a speech processing of the combined audio signal will output a different result from speech processing of the audio input signal.
信息查询
0/0