VOICE MODIFICATION DETECTION USING PHYSICAL MODELS OF SPEECH PRODUCTION
摘要:
A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.
信息查询
0/0