Speech denoising networks using speech and noise modeling

    公开(公告)号:US12260874B2

    公开(公告)日:2025-03-25

    申请号:US18058104

    申请日:2022-11-22

    Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.

    CROSS-MODAL TRANSFER WITH CONTINUOUSLY WEIGHTED CONTRASTIVE LOSS

    公开(公告)号:US20240394592A1

    公开(公告)日:2024-11-28

    申请号:US18434691

    申请日:2024-02-06

    Abstract: A method includes accessing a training dataset having multiple samples, where each sample includes a data point for each of multiple modalities. The method also includes generating, using a first encoder associated with a first modality of the multiple modalities, first modality embeddings for data points of the first modality in the training dataset. The method further includes, for each first modality embedding, determining a similarity metric to other first modality embeddings. The method also includes generating, using a second encoder associated with a second modality of the multiple modalities, second modality embeddings for data points of the second modality in the training dataset. In addition, the method includes training the second encoder based on a contrastive loss function to align the first modality embeddings and the second modality embeddings from different samples of the training dataset, where the contrastive loss function is weighed using the similarity metrics.

    SYSTEM AND METHOD FOR MASK-BASED NEURAL BEAMFORMING FOR MULTI-CHANNEL SPEECH ENHANCEMENT

    公开(公告)号:US20240331715A1

    公开(公告)日:2024-10-03

    申请号:US18457921

    申请日:2023-08-29

    CPC classification number: G10L21/0224 G10L2021/02166

    Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.

Patent Agency Ranking