-
公开(公告)号:US20210327445A1
公开(公告)日:2021-10-21
申请号:US17270053
申请日:2019-08-29
Inventor: Arijit Biswas , Jia Dai , Aaron Steven Master
IPC: G10L19/24
Abstract: Described herein is a method of low-bitrate coding of audio data and generating enhancement metadata for controlling audio enhancement of the low-bitrate coded audio data at a decoder side, including the steps of: (a) core encoding original audio data at a low bitrate to obtain encoded audio data; (b) generating enhancement metadata to be used for controlling a type and/or amount of audio enhancement at the decoder side after core decoding the encoded audio data; and (c) outputting the encoded audio data and the enhancement metadata. Described is further an encoder configured to perform said method. Described is moreover a method for generating enhanced audio data from low-bitrate coded audio data based on enhancement metadata and a decoder configured to perform said method.
-
公开(公告)号:US11929085B2
公开(公告)日:2024-03-12
申请号:US17270053
申请日:2019-08-29
Inventor: Arijit Biswas , Jia Dai , Aaron Steven Master
IPC: G10L19/24
CPC classification number: G10L19/24
Abstract: Described herein is a method of low-bitrate coding of audio data and generating enhancement metadata for controlling audio enhancement of the low-bitrate coded audio data at a decoder side, including the steps of: (a) core encoding original audio data at a low bitrate to obtain encoded audio data; (b) generating enhancement metadata to be used for controlling a type and/or amount of audio enhancement at the decoder side after core decoding the encoded audio data; and (c) outputting the encoded audio data and the enhancement metadata. Described is further an encoder configured to perform said method. Described is moreover a method for generating enhanced audio data from low-bitrate coded audio data based on enhancement metadata and a decoder configured to perform said method.
-
公开(公告)号:US11996108B2
公开(公告)日:2024-05-28
申请号:US17632220
申请日:2020-07-30
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jia Dai , Kai Li , Richard J. Cartwright
CPC classification number: G10L19/0208 , G06N20/00 , G10L19/005 , G10L25/18 , G10L25/21 , H04M3/568
Abstract: The present disclosure relates to the field of audio enhancement, and in particular to methods, devices and software for supervised training of a machine learning model, MLM, the MLM trained to enhance a degraded audio signal by calculating gains to be applied to frequency bands of the degraded audio signal. The present disclosure further relates to methods, devices and software for use of such a trained MLM.
-
公开(公告)号:US20220270625A1
公开(公告)日:2022-08-25
申请号:US17632220
申请日:2020-07-30
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jia Dai , Kai Li , Richard J. Cartwright
IPC: G10L19/02 , G10L25/18 , G10L25/21 , G10L19/005 , G06N20/00
Abstract: The present disclosure relates to the field of audio enhancement, and in particular to methods, devices and software for supervised training of a machine learning model, MLM, the MLM trained to enhance a degraded audio signal by calculating gains to be applied to frequency bands of the degraded audio signal. The present disclosure further relates to methods, devices and software for use of such a trained MLM.
-
公开(公告)号:US20240395267A1
公开(公告)日:2024-11-28
申请号:US18674555
申请日:2024-05-24
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jia Dai , Kai Li , Richard J. Cartwright
Abstract: The present disclosure relates to the field of audio enhancement, and in particular to methods, devices and software for supervised training of a machine learning model, MLM, the MLM trained to enhance a degraded audio signal by calculating gains to be applied to frequency bands of the degraded audio signal. The present disclosure further relates to methods, devices and software for use of such a trained MLM.
-
公开(公告)号:US20240290341A1
公开(公告)日:2024-08-29
申请号:US18571963
申请日:2022-06-28
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Kai Li , Jia Dai , Xiaoyu Liu
IPC: G10L21/0232 , G10L21/0208 , G10L25/21 , G10L25/30
CPC classification number: G10L21/0232 , G10L25/21 , G10L25/30 , G10L2021/02082 , G10L2021/02087
Abstract: A system for mitigating over-suppression of speech and other non-noise signals is disclosed. In some embodiments, a system is programmed to train a first machine learning model for speech detection or enhancement using a non-linear, asymmetric loss function that penalizes speech over-suppression more than speech under-suppression. The first machine learning model is configured to receive an audio signal and generate a mask indicating an amount of speech present in the audio signal. The mask can be adjusted to remedy sharp voice decay resulting from speech over-suppression. The system is also programmed to train a second machine learning model for laughter or applause detection. The system is further programmed to improve the quality of a new audio signal by applying an adjusted mask to the new audio signal except for the portions of the audio signal that have been identified as corresponding to laughter or applause.
-
7.
公开(公告)号:US12073828B2
公开(公告)日:2024-08-27
申请号:US17611121
申请日:2020-05-13
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai Sun , Zhiwei Shuang , Lie Lu , Shaofan Yang , Jia Dai
Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.
-
-
-
-
-
-