CROSS-MODAL TRANSFER WITH CONTINUOUSLY WEIGHTED CONTRASTIVE LOSS

    公开(公告)号:US20240394592A1

    公开(公告)日:2024-11-28

    申请号:US18434691

    申请日:2024-02-06

    Abstract: A method includes accessing a training dataset having multiple samples, where each sample includes a data point for each of multiple modalities. The method also includes generating, using a first encoder associated with a first modality of the multiple modalities, first modality embeddings for data points of the first modality in the training dataset. The method further includes, for each first modality embedding, determining a similarity metric to other first modality embeddings. The method also includes generating, using a second encoder associated with a second modality of the multiple modalities, second modality embeddings for data points of the second modality in the training dataset. In addition, the method includes training the second encoder based on a contrastive loss function to align the first modality embeddings and the second modality embeddings from different samples of the training dataset, where the contrastive loss function is weighed using the similarity metrics.

    Speech denoising networks using speech and noise modeling

    公开(公告)号:US12260874B2

    公开(公告)日:2025-03-25

    申请号:US18058104

    申请日:2022-11-22

    Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.

Patent Agency Ranking