-
公开(公告)号:US20230386453A1
公开(公告)日:2023-11-30
申请号:US18032815
申请日:2021-09-23
申请人: Thomson Licensing
CPC分类号: G10L15/08 , G10L15/22 , G10L15/34 , G10L2015/223
摘要: A method and device for detecting an audio adversarial attack with respect to a voice command processed by an automatic speech recognition system is described. The method is implemented by a detection device connected to the automatic speech recognition system and includes obtaining an audio signal associated with the voice command, performing a phonetic transcription of the audio signal, according to a phonetic transcription scheme, delivering a first character string; obtaining a transcript resulting from the processing, by the automatic speech recognition system, of the audio signal, performing a phonetic transcription of the transcript, according to the phonetic transcription scheme, delivering a second character string, computing a similarity score between the first character string and the second character string, and delivering a piece of data representative of a detection of an audio adversarial attack, as a function of a result of a comparison between the similarity score and a predetermined threshold.
-
公开(公告)号:US20230401338A1
公开(公告)日:2023-12-14
申请号:US18032819
申请日:2021-09-23
申请人: Thomson Licensing
CPC分类号: G06F21/629 , G10L15/26
摘要: A method and device is described and includes: obtaining an input audio signal associated with the voice input, obtaining a transcript resulting from the processing of the input audio signal, converting the transcript into a synthesized audio signal; extracting an acoustic feature of a same type from the input audio signal and synthesized audio signal, delivering a first sequence of features vectors associated with the input audio signal and a second sequence of features vectors associated with the synthesized audio signal converting the acoustic features to corresponding acoustic features associated with a target reference voice, delivering a first sequence and a second sequence of converted features vectors computing a dynamic time warping distance between the first sequence and second sequence of converted features vectors, and delivering data representative of a detection of an audio adversarial attack, as a result of a comparison between the dynamic time warping distance and a predetermined threshold.
-