-
1.
公开(公告)号:US20240355351A1
公开(公告)日:2024-10-24
申请号:US18302079
申请日:2023-04-18
Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
Inventor: Moshe Tzur , Elior Hadad
CPC classification number: G10L25/84 , G10L15/02 , G10L15/04 , G10L21/0232 , G10L25/18 , G10L25/90 , G10L25/93
Abstract: The single-channel, Speech Features-Based Voice Activity Detection (SFVAD) system is a robust, low-latency system that generates per-frame speech and noise indications, along with calculating a pair of speech and noise time-frequency masks. The SFVAD system controls an adaptation mechanism for a Beam-Forming system control module and improves the speech quality and noise reduction capabilities of Automatic Speech Recognition applications, such as Virtual Assistance (VA) and Hands-Free (HF) calls, by robustly handling transient noises. The system extracts speech-like patterns from an input audio signal and it is invariant to the power-level of the input audio signal. Noise calculation is controlled by a pair of speech features-based detectors (voiced and unvoiced). A Cepstral-based pitch detector and a Centrum calculation method are used to prevent contamination of the calculated noise by speech content. The SFVAD system robustly handles instant changes of background noise level and has dramatically lower false detection rates.
-
公开(公告)号:US20250095625A1
公开(公告)日:2025-03-20
申请号:US18468160
申请日:2023-09-15
Applicant: GM Global Technology Operations LLC
Inventor: Amos Schreibman , Elior Hadad , Eli Tzirkel-Hancock
IPC: G10K11/175 , G10K11/178
Abstract: A computer-implemented method executed by data processing hardware causes the data processing hardware to perform operations that include receiving multiple audio signals from a sensor array. The multiple audio signals include a target audio signal and interference audio signals. The data processing hardware then identifies a design constraint based on the multiple audio signals. The desired constraint includes a pass constraint corresponding to the target audio signal and a null constraint corresponding to the interference audio signals. The data processing hardware then compares a design filter weight of the design constraint with a filter weight maximum, designs an audio filter using the desired constraint, and filters the multiple audio signals using the designed audio filter.
-
公开(公告)号:US20250118321A1
公开(公告)日:2025-04-10
申请号:US18482978
申请日:2023-10-09
Applicant: GM Global Technology Operations LLC
Inventor: Moshe Levy Israel , Elior Hadad
IPC: G10L21/0232 , G10L21/0208 , G10L21/0216 , G10L21/034 , G10L21/0364 , G10L25/18 , G10L25/21
Abstract: A computer-implemented method executed by data processing hardware that causes the data processing hardware to perform operations to design an audio filter. The operations include receiving multiple audio signals from a sensor array, the multiple audio signals including a target audio signal and interference audio signals and leveraging the interference audio signals. The multiple audio signals are processed using short-time Fourier transform (STFT) for each of the multiple audio signals. The operations also include designing the audio filter using the determined prior-SNR and enhancing the target audio signal using the leveraged interference audio signals and the designed audio filter and attenuating the interference audio signals.
-
公开(公告)号:US20250087217A1
公开(公告)日:2025-03-13
申请号:US18466503
申请日:2023-09-13
Applicant: GM Global Technology Operations LLC , Bar-Ilan University
Inventor: Boris Rubenchik , Elior Hadad , Eli Tzirkel-Hancock , Sharon Gannot , Ethan Fetaya
IPC: G10L17/06 , G10L21/028
Abstract: A system for speech separation includes data processing hardware and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations including (i) generating a two-dimensional representation of a speech mixture, (ii) separating the speech mixture into an initial separation (iii) supplying the initial separation and speaker representations to a refinement module, (iv) refining the initial separation based on the initial separation and the speaker representations, (v) estimating a mask per speaker, and (vi) applying the masks to the two-dimensional representation to create two-dimensional, per-speaker representations.
-
-
-