VOLUME LEVELER CONTROLLER AND CONTROLLING METHOD
    21.
    发明申请
    VOLUME LEVELER CONTROLLER AND CONTROLLING METHOD 有权
    体积调节器和控制方法

    公开(公告)号:US20160049915A1

    公开(公告)日:2016-02-18

    申请号:US14777271

    申请日:2014-03-17

    Abstract: Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.

    Abstract translation: 公开了卷积矫直机控制器和控制方法。 在一个实施例中,音量调平器控制器包括用于实时地识别音频信号的内容类型的音频内容分类器; 以及调整单元,用于基于所识别的内容类型以连续的方式调整音量调节器。 调整单元可以被配置为使音量调平器的动态增益与音频信号的信息内容类型正相关,并且将音量调平器的动态增益与音频信号的干扰内容类型负相关。

    Object Clustering for Rendering Object-Based Audio Content Based on Perceptual Criteria
    22.
    发明申请
    Object Clustering for Rendering Object-Based Audio Content Based on Perceptual Criteria 有权
    基于感知标准渲染基于对象的音频内容的对象聚类

    公开(公告)号:US20150332680A1

    公开(公告)日:2015-11-19

    申请号:US14654460

    申请日:2013-11-25

    Abstract: Embodiments are directed a method of rendering object-based audio comprising determining an initial spatial position of objects having object audio data and associated metadata, determining a perceptual importance of the objects, and grouping the audio objects into a number of clusters based on the determined perceptual importance of the objects, such that a spatial error caused by moving an object from an initial spatial position to a second spatial position in a cluster is minimized for objects with a relatively high perceptual importance. The perceptual importance is based at least in part by a partial loudness of an object and content semantics of the object.

    Abstract translation: 实施例涉及一种渲染基于对象的音频的方法,包括:确定具有对象音频数据和相关元数据的对象的初始空间位置,确定对象的感知重要性,以及基于所确定的知觉,将音频对象分组成多个聚类 使得通过将对象从群集中的初始空间位置移动到第二空间位置而引起的空间误差最小化为具有相对高感知重要性的对象的对象的重要性。 感知重要性至少部分地基于对象的部分响度和对象的内容语义。

    CONTROL OF A VOLUME LEVELING UNIT USING TWO-STAGE NOISE CLASSIFIER

    公开(公告)号:US20250166652A1

    公开(公告)日:2025-05-22

    申请号:US18835248

    申请日:2023-02-06

    Abstract: Volume leveling of an audio signal using a volume leveling control signal. The method comprises determining a noise reliability ratio w(n) as a ratio of noise-like frames over all frames in a current time segment, determining a PGC noise confidence score XPGN(n) indicating a likelihood that professionally generated content, PGC, noise is present in the time segment, and determining, for the time segment, whether the noise reliability ratio is above a predetermined threshold. When the noise reliability ratio is above the predetermined threshold, the volume leveling control signal is updated based on the PGC noise confidence score, and when the noise reliability ratio is below the predetermined threshold, the volume leveling control signal is left unchanged. Volume leveling is improved by preventing boosting of e.g. phone-recorded environmental noise in UGC, while keeping original behavior for other types of content.

    METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK

    公开(公告)号:US20220223144A1

    公开(公告)日:2022-07-14

    申请号:US17611121

    申请日:2020-05-13

    Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

    PROCESSING OBJECT-BASED AUDIO SIGNALS
    28.
    发明申请

    公开(公告)号:US20190222951A1

    公开(公告)日:2019-07-18

    申请号:US16368574

    申请日:2019-03-28

    Abstract: An audio processing system and method which calculates, based on spatial metadata of the audio object, a panning coefficient for each of the audio objects in relation to each of a plurality of predefined channel coverage zones. Converts the audio signal into submixes in relation to the predefined channel coverage zones based on the calculated panning coefficients and the audio objects. Each of the submixes indicating a sum of components of the plurality of the audio objects in relation to one of the predefined channel coverage zones. Generating a submix gain by applying an audio processing to each of the submix and controls an object gain applied to each of the audio objects. The object gain being as a function of the panning coefficients for each of the audio objects and the submix gains in relation to each of the predefined channel coverage zones.

    UPMIXING OF AUDIO SIGNALS
    30.
    发明申请

    公开(公告)号:US20180262856A1

    公开(公告)日:2018-09-13

    申请号:US15538892

    申请日:2016-02-09

    Abstract: Example embodiments disclosed herein relates to upmixing of audio signals. A method of upmixing an audio signal is described. The method includes decomposing the audio signal into a diffuse signal and a direct signal, generating an audio bed at least in part based on the diffuse signal, the audio bed including a height channel, extracting an audio object from the direct signal, estimating metadata of the audio object, the metadata including height information of the audio object; and rendering the audio bed and the audio object as an upmixed audio signal, wherein the audio bed is rendered to a predefined position and the audio object is rendered according to the metadata. Corresponding system and computer program product are described as well.

Patent Agency Ranking