-
公开(公告)号:US20230162725A1
公开(公告)日:2023-05-25
申请号:US17534221
申请日:2021-11-23
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu JIN , Jiaqi SU , Adam FINKELSTEIN
CPC classification number: G10L15/16 , G10L15/063 , G06N3/0454
Abstract: Embodiments are disclosed for generating full-band audio from narrowband audio using a GAN-based audio super resolution model. A method of generating full-band audio may include receiving narrow-band input audio data, upsampling the narrow-band input audio data to generate upsampled audio data, providing the upsampled audio data to an audio super resolution model, the audio super resolution model trained to perform bandwidth expansion from narrow-band to wide-band, and returning wide-band output audio data corresponding to the narrow-band input audio data.
-
公开(公告)号:US20240331720A1
公开(公告)日:2024-10-03
申请号:US18191763
申请日:2023-03-28
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu JIN , Jiaqi SU , Adam FINKELSTEIN
IPC: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60
CPC classification number: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60 , G10L21/0364 , G10L25/30
Abstract: Embodiments are disclosed for converting audio data to studio quality audio data. The method includes obtaining an audio data having a first quality for conversion to studio quality audio. A first machine learning model predicts a set of acoustic features. A spectral mask is applied to the audio data during the prediction of the set of acoustic features. A second machine learning model generates studio quality audio from the set of acoustic features and the audio data.
-
公开(公告)号:US20240257798A1
公开(公告)日:2024-08-01
申请号:US18104434
申请日:2023-02-01
Applicant: ADOBE INC.
Inventor: Oriol NIETO-CABALLERO , Zeyu JIN , Justin Jonathan SALAMON , Franck DERNONCOURT
CPC classification number: G10L15/005 , G10L25/30
Abstract: Some aspects of the technology described herein employ a neural network with an efficient and lightweight architecture to perform spoken language recognition. Given an audio signal comprising speech, features are generated from the audio signal, for instance, by converting the audio signal to a normalized spectrogram. The features are input to the neural network, which has one or more convolutional layers and an output activation layer. Each neuron of the output activation layer corresponds to a language from a set of language and generates an activation value. Based on the activations values, an indication of zero or more languages from the set of languages is provided for the audio signal.
-
公开(公告)号:US20210256978A1
公开(公告)日:2021-08-19
申请号:US16790301
申请日:2020-02-13
Applicant: ADOBE INC.
Inventor: Zeyu JIN , Oona Shigeno RISSE-ADAMS
Abstract: Embodiments provide systems, methods, and computer storage media for secure audio watermarking and audio authenticity verification. An audio watermark detector may include a neural network trained to detect a particular audio watermark and embedding technique, which may indicate source software used in a workflow that generated an audio file under test. For example, the watermark may indicate an audio file was generated using voice manipulation software, so detecting the watermark can indicate manipulated audio such as deepfake audio and other attacked audio signals. In some embodiments, the audio watermark detector may be trained as part of a generative adversarial network in order to make the underlying audio watermark more robust to neural network-based attacks. Generally, the audio watermark detector may evaluate time domain samples from chunks of an audio clip under test to detect the presence of the audio watermark and generate a classification for the audio clip.
-
-
-