Patent search ap:("Dolby Laboratories Licensing Corporation") AND inv:"Vivek Kumar" Page 1

1.

发明授权
Adaptive Quantization 有权

公开(公告)号：US10395664B2

公开(公告)日：2019-08-27

申请号：US16072168

申请日：2017-01-26

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Nicolas R. Tsingos , Zachary Gideon Cohen , Vivek Kumar

IPC: G10L19/032 , G10L19/20 , G10L19/002 , G10L19/00 , H03M1/00

Abstract: An importance metric, based at least in part on an energy metric, may be determined for each of a plurality of received audio objects. Some methods may involve: determining a global importance metric for all of the audio objects, based, at least in part, on a total energy value calculated by summing the energy metric of each of the audio objects; determining an estimated quantization bit depth and a quantization error for each of the audio objects; calculating a total noise metric for all of the audio objects, the total noise metric being based, at least in part, on a total quantization error corresponding with the estimated quantization bit depth; calculating a total signal-to-noise ratio corresponding with the total noise metric and the total energy value; and determining a final quantization bit depth for each of the audio objects by applying a signal-to-noise ratio threshold to the total signal-to-noise ratio.

2.

发明授权
Perceptually-based loss functions for audio encoding and decoding based on machine learning 有权

公开(公告)号：US11817111B2

公开(公告)日：2023-11-14

申请号：US17046284

申请日：2019-04-10

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Roy M. Fejgin , Grant A. Davidson , Chih-Wei Wu , Vivek Kumar

IPC: G10L19/022 , G06F3/16 , G06N3/084 , G06N3/048

CPC classification number: G10L19/022 , G06F3/16 , G06N3/048 , G06N3/084

Abstract: Computer-implemented methods for training a neural network, as well as for implementing audio encoders and decoders via trained neural networks, are provided. The neural network may receive an input audio signal, generate an encoded audio signal and decode the encoded audio signal. A loss function generating module may receive the decoded audio signal and a ground truth audio signal, and may generate a loss function value corresponding to the decoded audio signal. Generating the loss function value may involve applying a psychoacoustic model. The neural network may be trained based on the loss function value. The training may involve updating at least one weight of the neural network.

3.

发明授权
Signal decorrelation in an audio processing system 有权

公开(公告)号：US09830916B2

公开(公告)日：2017-11-28

申请号：US14766371

申请日：2014-01-22

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Vinay Melkote , Kuan-Chieh Yen , Grant A. Davidson , Matthew Fellers , Mark S. Vinton , Vivek Kumar

IPC: G10L19/008 , G10L19/02 , H04S3/00 , H04S5/00 , H04L19/00 , H04L25/06 , G10L19/06 , G10L25/06

CPC classification number: G10L19/008 , G10L19/02 , G10L19/06 , G10L25/06 , H04S3/008 , H04S5/00

Abstract: Audio processing methods may involve receiving audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filterbank coefficients of an audio encoding or processing system. A decorrelation process may be performed with the same filterbank coefficients used by the audio encoding or processing system. The decorrelation process may be performed without converting coefficients of the frequency domain representation to another frequency domain or time domain representation. The decorrelation process may involve selective or signal-adaptive decorrelation of specific channels and/or specific frequency bands. The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data to produce filtered audio data. The decorrelation process may involve using a non-hierarchal mixer to combine a direct portion of the received audio data with the filtered audio data according to spatial parameters.

4.

发明申请
SYSTEMS AND METHODS FOR ADAPTING HUMAN SPEAKER EMBEDDINGS IN SPEECH SYNTHESIS 有权

公开(公告)号：US20220335925A1

公开(公告)日：2022-10-20

申请号：US17636851

申请日：2020-08-18

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventor： Cong ZHOU , Xiaoyu LIU , Michael Getty HORGAN , Vivek Kumar

IPC: G10L13/033 , G10L13/047

Abstract: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.

5.

发明授权
Low bit rate parametric encoding and transport of haptic-tactile signals 有权

公开(公告)号：US10140822B2

公开(公告)日：2018-11-27

申请号：US15747096

申请日：2016-08-03

Applicant: DOLBY LABORATORIES LICENSING CORPORATION , DOLBY INTERNATIONAL AB

Inventor： Sunil Bharitkar , Charles Q. Robinson , Vivek Kumar , Jeffrey Riedmiller , Christof Fersch

IPC: H04B3/36 , G08B6/00 , G10L19/16

Abstract: Techniques for low bit rate parametric encoding of haptic-tactile signals. The techniques encompass a parametric encoding method. The parametric encoding method includes the steps of: for at least one frame of a plurality of frames of a source haptic-tactile signal, representing the source haptic-tactile signal in the frame as a set of parameters and according to a functional representation; and including the set of parameters in a bit stream that encodes the source haptic-tactile signal. The functional representation is based on one of a set of orthogonal functionals, or polynomial approximation. For example, the functional representation can be based on one of Chebyshev functionals of the first kind through order n, Chebyshev functionals of the second kind through order n, or k-th order polynomial approximation.

6.

发明授权
Systems and methods for adapting human speaker embeddings in speech synthesis 有权

公开(公告)号：US11929058B2

公开(公告)日：2024-03-12

申请号：US17636851

申请日：2020-08-18

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventor： Cong Zhou , Xiaoyu Liu , Michael Getty Horgan , Vivek Kumar

IPC: G10L21/00 , G10L13/00 , G10L13/033 , G10L13/047 , G10L13/08 , G10L17/12

CPC classification number: G10L13/033 , G10L13/047

Abstract: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.

7.

发明授权
Speech style transfer 有权

公开(公告)号：US11538455B2

公开(公告)日：2022-12-27

申请号：US16969950

申请日：2019-02-14

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventor： Cong Zhou , Michael Getty Horgan , Vivek Kumar , Jaime H. Morales , Cristina Michel Vasco

IPC: G10L25/00 , G06F15/00 , G10L13/02 , G06F40/42 , G06N3/04 , G06N3/08 , G10L25/30

Abstract: Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.

8.

发明授权
Audio capture for aerial devices 有权

公开(公告)号：US10979613B2

公开(公告)日：2021-04-13

申请号：US15785977

申请日：2017-10-17

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Timo Kunkel , Cong Zhou , Vivek Kumar , Remi S. Audfray

IPC: H04N5/232 , G06T7/70 , H04N5/04 , B64C39/02

Abstract: Methods, systems, and computer program products for automatically positioning a content capturing device are disclosed. A vehicle, e.g., an UAV, carries the content capturing device, e.g., a camcorder. The UAV can position the content capturing device at a best location for viewing a subject based on one or more audio or visual cues. The UAV can follow movement of the subject to achieve best audio or visual effect. In some implementations, a controller device carried by the subject can generate one or more signals for the UAV to follow. The controller device may be coupled to a microphone that records audio. The signals can be used to temporally synchronize video captured at the UAV and audio captured by the microphone.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification