-
公开(公告)号:US10395664B2
公开(公告)日:2019-08-27
申请号:US16072168
申请日:2017-01-26
IPC分类号: G10L19/032 , G10L19/20 , G10L19/002 , G10L19/00 , H03M1/00
摘要: An importance metric, based at least in part on an energy metric, may be determined for each of a plurality of received audio objects. Some methods may involve: determining a global importance metric for all of the audio objects, based, at least in part, on a total energy value calculated by summing the energy metric of each of the audio objects; determining an estimated quantization bit depth and a quantization error for each of the audio objects; calculating a total noise metric for all of the audio objects, the total noise metric being based, at least in part, on a total quantization error corresponding with the estimated quantization bit depth; calculating a total signal-to-noise ratio corresponding with the total noise metric and the total energy value; and determining a final quantization bit depth for each of the audio objects by applying a signal-to-noise ratio threshold to the total signal-to-noise ratio.
-
2.
公开(公告)号:US11817111B2
公开(公告)日:2023-11-14
申请号:US17046284
申请日:2019-04-10
发明人: Roy M. Fejgin , Grant A. Davidson , Chih-Wei Wu , Vivek Kumar
IPC分类号: G10L19/022 , G06F3/16 , G06N3/084 , G06N3/048
CPC分类号: G10L19/022 , G06F3/16 , G06N3/048 , G06N3/084
摘要: Computer-implemented methods for training a neural network, as well as for implementing audio encoders and decoders via trained neural networks, are provided. The neural network may receive an input audio signal, generate an encoded audio signal and decode the encoded audio signal. A loss function generating module may receive the decoded audio signal and a ground truth audio signal, and may generate a loss function value corresponding to the decoded audio signal. Generating the loss function value may involve applying a psychoacoustic model. The neural network may be trained based on the loss function value. The training may involve updating at least one weight of the neural network.
-
公开(公告)号:US09830916B2
公开(公告)日:2017-11-28
申请号:US14766371
申请日:2014-01-22
发明人: Vinay Melkote , Kuan-Chieh Yen , Grant A. Davidson , Matthew Fellers , Mark S. Vinton , Vivek Kumar
IPC分类号: G10L19/008 , G10L19/02 , H04S3/00 , H04S5/00 , H04L19/00 , H04L25/06 , G10L19/06 , G10L25/06
摘要: Audio processing methods may involve receiving audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filterbank coefficients of an audio encoding or processing system. A decorrelation process may be performed with the same filterbank coefficients used by the audio encoding or processing system. The decorrelation process may be performed without converting coefficients of the frequency domain representation to another frequency domain or time domain representation. The decorrelation process may involve selective or signal-adaptive decorrelation of specific channels and/or specific frequency bands. The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data to produce filtered audio data. The decorrelation process may involve using a non-hierarchal mixer to combine a direct portion of the received audio data with the filtered audio data according to spatial parameters.
-
公开(公告)号:US20220335925A1
公开(公告)日:2022-10-20
申请号:US17636851
申请日:2020-08-18
发明人: Cong ZHOU , Xiaoyu LIU , Michael Getty HORGAN , Vivek Kumar
IPC分类号: G10L13/033 , G10L13/047
摘要: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.
-
公开(公告)号:US10140822B2
公开(公告)日:2018-11-27
申请号:US15747096
申请日:2016-08-03
摘要: Techniques for low bit rate parametric encoding of haptic-tactile signals. The techniques encompass a parametric encoding method. The parametric encoding method includes the steps of: for at least one frame of a plurality of frames of a source haptic-tactile signal, representing the source haptic-tactile signal in the frame as a set of parameters and according to a functional representation; and including the set of parameters in a bit stream that encodes the source haptic-tactile signal. The functional representation is based on one of a set of orthogonal functionals, or polynomial approximation. For example, the functional representation can be based on one of Chebyshev functionals of the first kind through order n, Chebyshev functionals of the second kind through order n, or k-th order polynomial approximation.
-
公开(公告)号:US11929058B2
公开(公告)日:2024-03-12
申请号:US17636851
申请日:2020-08-18
发明人: Cong Zhou , Xiaoyu Liu , Michael Getty Horgan , Vivek Kumar
IPC分类号: G10L21/00 , G10L13/00 , G10L13/033 , G10L13/047 , G10L13/08 , G10L17/12
CPC分类号: G10L13/033 , G10L13/047
摘要: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.
-
公开(公告)号:US11538455B2
公开(公告)日:2022-12-27
申请号:US16969950
申请日:2019-02-14
摘要: Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.
-
公开(公告)号:US10979613B2
公开(公告)日:2021-04-13
申请号:US15785977
申请日:2017-10-17
发明人: Timo Kunkel , Cong Zhou , Vivek Kumar , Remi S. Audfray
摘要: Methods, systems, and computer program products for automatically positioning a content capturing device are disclosed. A vehicle, e.g., an UAV, carries the content capturing device, e.g., a camcorder. The UAV can position the content capturing device at a best location for viewing a subject based on one or more audio or visual cues. The UAV can follow movement of the subject to achieve best audio or visual effect. In some implementations, a controller device carried by the subject can generate one or more signals for the UAV to follow. The controller device may be coupled to a microphone that records audio. The signals can be used to temporally synchronize video captured at the UAV and audio captured by the microphone.
-
-
-
-
-
-
-