CENTRALIZED SYNTHETIC SPEECH DETECTION SYSTEM USING WATERMARKING

    公开(公告)号:US20250029614A1

    公开(公告)日:2025-01-23

    申请号:US18777278

    申请日:2024-07-18

    Abstract: Disclosed are systems and methods including software processes executed by a server for obtaining, by a computer, an audio signal including synthetic speech, extracting, by the computer, metadata from a watermark of the audio signal by applying a set of keys associated with a plurality of text-to-speech (TTS) services to the audio signal, the metadata indicating an origin of the synthetic speech in the audio signal, and generating, by the computer, based on the extracted metadata, a notification indicating that the audio signal includes the synthetic speech.

    DETECTION OF CALLS FROM VOICE ASSISTANTS
    7.
    发明申请

    公开(公告)号:US20200312313A1

    公开(公告)日:2020-10-01

    申请号:US16829705

    申请日:2020-03-25

    Abstract: Embodiments described herein provide for automatically classifying the types of devices that place calls to a call center. A call center system can detect whether an incoming call originated from voice assistant device using trained classification models received from a call analysis service. Embodiments described herein provide for methods and systems in which a computer executes machine learning algorithms that programmatically train (or otherwise generate) global or tailored classification models based on the various types of features of an audio signal and call data. A classification model is deployed to one or more call centers, where the model is used by call center computers executing classification processes for determining whether incoming telephone calls originated from a voice assistant device, such as Amazon Alexa® and Google Home®, or another type of device (e.g., cellular/mobile phone, landline phone, VoIP).

    VOICE MODIFICATION DETECTION USING PHYSICAL MODELS OF SPEECH PRODUCTION

    公开(公告)号:US20190311730A1

    公开(公告)日:2019-10-10

    申请号:US16375785

    申请日:2019-04-04

    Abstract: A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.

Patent Agency Ranking