-
公开(公告)号:US11842748B2
公开(公告)日:2023-12-12
申请号:US17121291
申请日:2020-12-14
Applicant: PINDROP SECURITY, INC.
Inventor: Elie Khoury , Matthew Garland
Abstract: Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs.
-
公开(公告)号:US11756564B2
公开(公告)日:2023-09-12
申请号:US16442279
申请日:2019-06-14
Applicant: PINDROP SECURITY, INC.
Inventor: Ganesh Sivaraman , Elie Khoury
IPC: G10L21/0232 , G10L25/30 , G06N3/048
CPC classification number: G10L21/0232 , G06N3/048 , G10L25/30
Abstract: A computer may segment a noisy audio signal into audio frames and execute a deep neural network (DNN) to estimate an instantaneous function of clean speech spectrum and noisy audio spectrum in the audio frame. This instantaneous function may correspond to a ratio of an a-priori signal to noise ratio (SNR) and an a-posteriori SNR of the audio frame. The computer may add estimated instantaneous function to the original noisy audio frame to output an enhanced speech audio frame.
-
公开(公告)号:US10672403B2
公开(公告)日:2020-06-02
申请号:US15890967
申请日:2018-02-07
Applicant: PINDROP SECURITY, INC.
Inventor: Elie Khoury , Matthew Garland
IPC: G10L17/26 , G10L15/26 , H04L29/06 , G06K9/00 , G06F21/32 , G06K9/62 , G10L25/30 , G10L17/18 , G10L17/04
Abstract: A score indicating a likelihood that a first subject is the same as a second subject may be calibrated to compensate for aging of the first subject between samples of age-sensitive biometric characteristics. Age of the first subject obtained at a first sample time and age of the second subject obtained at a second sample time may be averaged, and an age approximation may be generated based on at least the age average and an interval between the first and second samples. The age approximation, the interval between the first and second sample times, and an obtained gender of the subject are used to calibrate the likelihood score.
-
公开(公告)号:US10141009B2
公开(公告)日:2018-11-27
申请号:US15610378
申请日:2017-05-31
Applicant: PINDROP SECURITY, INC.
Inventor: Elie Khoury , Matthew Garland
Abstract: Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs.
-
公开(公告)号:US20240363103A1
公开(公告)日:2024-10-31
申请号:US18388412
申请日:2023-11-09
Applicant: Pindrop Security, Inc.
Inventor: Umair Altaf , Sai Pradeep Peri , Lakshay Phatela , Payas Gupta , Yitao Sun , Svetlana Afanaseva , Kailash Patil , Elie Khoury , Bradley Magnetta , Vijay Balasubramaniyan , Tianxiang Chen
IPC: G10L15/08
CPC classification number: G10L15/08
Abstract: Disclosed are systems and methods including software processes executed by a server that detect audio-based synthetic speech (“deepfakes”) in a call conversation. The server applies an NLP engine to transcribe call audio and analyze the text for anomalous patterns to detect synthetic speech. Additionally or alternatively, the server executes a voice “liveness” detection system for detecting machine speech, such as synthetic speech or replayed speech. The system performs phrase repetition detection, background change detection, and passive voice liveness detection in call audio signals to detect liveness of a speech utterance. An automated model update module allows the liveness detection model to adapt to new types of presentation attacks, based on the human provided feedback.
-
公开(公告)号:US20240355322A1
公开(公告)日:2024-10-24
申请号:US18388428
申请日:2023-11-09
Applicant: Pindrop Security, Inc.
Inventor: Umair Altaf , Sai Pradeep Peri , Lakshay Phatela , Payas Gupta , Yitao Sun , Svetlana Afanaseva , Kailash Patil , Elie Khoury , Bradley Magnetta , Vijay Balasubramaniyan , Tianxiang Chen
IPC: G10L15/08
Abstract: Disclosed are systems and methods including software processes executed by a server that detect audio-based synthetic speech (“deepfakes”) in a call conversation. The server applies an NLP engine to transcribe call audio and analyze the text for anomalous patterns to detect synthetic speech. Additionally or alternatively, the server executes a voice “liveness” detection system for detecting machine speech, such as synthetic speech or replayed speech. The system performs phrase repetition detection, background change detection, and passive voice liveness detection in call audio signals to detect liveness of a speech utterance. An automated model update module allows the liveness detection model to adapt to new types of presentation attacks, based on the human provided feedback.
-
公开(公告)号:US11715460B2
公开(公告)日:2023-08-01
申请号:US17066210
申请日:2020-10-08
Applicant: PINDROP SECURITY, INC.
Inventor: Elie Khoury , Ganesh Sivaraman , Tianxiang Chen , Amruta Vidwans
CPC classification number: G10L15/16 , G10L15/063 , G10L17/04 , G10L25/51
Abstract: Described herein are systems and methods for improved audio analysis using a computer-executed neural network having one or more in-network data augmentation layers. The systems described herein help ease or avoid unwanted strain on computing resources by employing the data augmentation techniques within the layers of the neural network. The in-network data augmentation layers will produce various types of simulated audio data when the computer applies the neural network on an inputted audio signal during a training phase, enrollment phase, and/or testing phase. Subsequent layers of the neural network (e.g., convolutional layer, pooling layer, data augmentation layer) ingest the simulated audio data and the inputted audio signal and perform various operations.
-
公开(公告)号:US11670304B2
公开(公告)日:2023-06-06
申请号:US16895750
申请日:2020-06-08
Applicant: PINDROP SECURITY, INC.
Inventor: Elie Khoury , Matthew Garland
IPC: G10L17/00 , H04M1/27 , G10L17/24 , G10L15/19 , G10L17/08 , G06N7/01 , G10L15/07 , G10L15/26 , G10L17/04
CPC classification number: G10L17/00 , G06N7/01 , G10L15/07 , G10L15/19 , G10L15/26 , G10L17/04 , G10L17/08 , G10L17/24 , H04M1/271 , H04M2203/40
Abstract: Utterances of at least two speakers in a speech signal may be distinguished and the associated speaker identified by use of diarization together with automatic speech recognition of identifying words and phrases commonly in the speech signal. The diarization process clusters turns of the conversation while recognized special form phrases and entity names identify the speakers. A trained probabilistic model deduces which entity name(s) correspond to the clusters.
-
公开(公告)号:US20230005486A1
公开(公告)日:2023-01-05
申请号:US17855149
申请日:2022-06-30
Applicant: Pindrop Security, Inc.
Inventor: Tianxiang Chen , Elie Khoury
Abstract: Embodiments include a computer executing voice biometric machine-learning for speaker recognition. The machine-learning architecture includes embedding extractors that extract embeddings for enrollment or for verifying inbound speakers, and embedding convertors that convert enrollment voiceprints from a first type of embedding to a second type of embedding. The embedding convertor maps the feature vector space of the first type of embedding to the feature vector space of the second type of embedding. The embedding convertor takes as input enrollment embeddings of the first type of embedding and generates as output converted enrolled embeddings that are aggregated into a converted enrolled voiceprint of the second type of embedding. To verify an inbound speaker, a second embedding extractor generates an inbound voiceprint of the second type of embedding, and scoring layers determine a similarity between the inbound voiceprint and the converted enrolled voiceprint, both of which are the second type of embedding.
-
公开(公告)号:US12142083B2
公开(公告)日:2024-11-12
申请号:US17503152
申请日:2021-10-15
Applicant: Pindrop Security, Inc.
Inventor: Tianxiang Chen , Elie Khoury
IPC: G06K9/00 , G06F18/21 , G06F18/22 , G06K9/62 , G06V20/40 , G06V40/16 , G06V40/40 , G06V40/70 , G10L17/22
Abstract: The embodiments execute machine-learning architectures for biometric-based identity recognition (e.g., speaker recognition, facial recognition) and deepfake detection (e.g., speaker deepfake detection, facial deepfake detection). The machine-learning architecture includes layers defining multiple scoring components, including sub-architectures for speaker deepfake detection, speaker recognition, facial deepfake detection, facial recognition, and lip-sync estimation engine. The machine-learning architecture extracts and analyzes various types of low-level features from both audio data and visual data, combines the various scores, and uses the scores to determine the likelihood that the audiovisual data contains deepfake content and the likelihood that a claimed identity of a person in the video matches to the identity of an expected or enrolled person. This enables the machine-learning architecture to perform identity recognition and verification, and deepfake detection, in an integrated fashion, for both audio data and visual data.
-
-
-
-
-
-
-
-
-