METHOD, APPARATUS AND SYSTEM FOR ENHANCING MULTI-CHANNEL AUDIO IN A DYNAMIC RANGE REDUCED DOMAIN

    公开(公告)号:US20230178084A1

    公开(公告)日:2023-06-08

    申请号:US17921858

    申请日:2021-04-29

    Inventor: Arijit Biswas

    CPC classification number: G10L19/008 G06N3/08 G10L19/26 G10L25/30

    Abstract: Described herein is a method of generating, in a dynamic range reduced domain, an enhanced multi-channel audio signal from an audio bitstream including a multi-channel audio signal, wherein the multi-channel audio signal comprises two or more channels, and wherein the method includes jointly enhancing the two or more channels of the dynamic range reduced raw multi-channel audio signal using a multi-channel Generator of a Generative Adversarial Network setting. Described herein are further a method for training a multi-channel Generator in a dynamic range reduced domain in a Generative Adversarial Network setting, an apparatus for generating, in a dynamic range reduced domain, an enhanced multi-channel audio signal from an audio bitstream including a multi-channel audio signal, respective systems and a computer program product.

    Dialog enhancement complemented with frequency transposition

    公开(公告)号:US10129659B2

    公开(公告)日:2018-11-13

    申请号:US15567270

    申请日:2016-05-04

    Inventor: Arijit Biswas

    Abstract: A method, a system and a computer program product are disclosed for enhancing an audio signal in relation to a hearing impairment. An input signal is obtained comprising input sub-band signals in a frequency range comprising a source range and a target range. The input sub-band signals in the source range are selectively transposed into transposed sub-band signals in the target range according to a predefined transposing rule. A masking threshold is determined based on a predefined perceptual model and perceptually relevant sub-band signals of the transposed sub-band signals in the target range exceeding the masking threshold are detected. Input sub-band signals in the target range are selectively replaced with corresponding detected perceptually relevant sub-band signals of the transposed sub-band signals in the target range.

    METHOD AND APPARATUS FOR UPDATING A NEURAL NETWORK

    公开(公告)号:US20220156584A1

    公开(公告)日:2022-05-19

    申请号:US17438908

    申请日:2020-03-05

    Abstract: Described herein is a method of generating a media bitstream to transmit parameters for updating a neural network implemented in a decoder, wherein the method includes the steps of: (a) determining at least one set of parameters for updating the neural network; (b) encoding the at least one set of parameters and media data to generate the media bitstream; and (c) transmitting the media bitstream to the decoder for updating the neural network with the at least one set of parameters. Described herein are further a method for updating a neural network implemented in a decoder, an apparatus for generating a media bitstream to transmit parameters for updating a neural network implemented in a decoder, an apparatus for updating a neural network implemented in a decoder and computer program products comprising a computer-readable storage medium with instructions adapted to cause the device to carry out said methods when executed by a device having processing capability.

    Low complexity dense transient events detection and coding

    公开(公告)号:US11232804B2

    公开(公告)日:2022-01-25

    申请号:US16628235

    申请日:2018-07-03

    Abstract: The present disclosure relates to methods and apparatus for audio coding. A method of encoding a portion of an audio signal comprises determining whether the portion of the audio signal is likely to contain dense transient events, and if it is determined that the portion of the audio signal is likely to contain dense transient events, quantizing the portion of the audio signal using a quantization 5 mode that applies a substantially constant signal-to-noise ratio over frequency for the portion of the audio signal. The present disclosure further relates to a method of detecting dense transient events in a portion of an audio signal.

    Decoder-provided time domain aliasing cancellation during lossy/lossless transitions

    公开(公告)号:US10438597B2

    公开(公告)日:2019-10-08

    申请号:US16115795

    申请日:2018-08-29

    Inventor: Arijit Biswas

    Abstract: Systems and methods are described for switching between lossy coded time segments and a lossless stream of the same source audio. A decoder may receive lossy coded time segments that include audio encoded using frequency-domain lossy coding. The decoder may also receive a lossless stream, which the decoder plays back, that includes audio from the same source encoded using lossless coding. In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream, which may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window. Audio playback of the lossy coded time segments may then be provided, beginning with the aliasing-canceled transition frame.

    Method and system for encoding audio data with adaptive low frequency compensation
    9.
    发明授权
    Method and system for encoding audio data with adaptive low frequency compensation 有权
    用自适应低频补偿编码音频数据的方法和系统

    公开(公告)号:US09275649B2

    公开(公告)日:2016-03-01

    申请号:US14325130

    申请日:2014-07-07

    CPC classification number: G10L19/028 G10L19/0204 G10L19/032 G10L19/265

    Abstract: A method for determining mantissa bit allocation of audio data values of frequency domain audio data to be encoded. The allocation method includes a step of determining masking values for the audio data values, including by performing adaptive low frequency compensation on the audio data of each frequency band of a set of low frequency bands of the audio data. The adaptive low frequency compensation includes steps of: performing tonality detection on the audio data to generate compensation control data indicative of whether each frequency band in the set of low frequency bands has prominent tonal content; and performing low frequency compensation on the audio data in each frequency band in the set of low frequency bands having prominent tonal content as indicated by the compensation control data, but not performing low frequency compensation on the audio data in any other frequency band in the set of low frequency bands.

    Abstract translation: 一种用于确定要编码的频域音频数据的音频数据值的尾数位分配的方法。 分配方法包括通过对音频数据的一组低频带的每个频带的音频数据执行自适应低频补偿来确定音频数据值的屏蔽值的步骤。 所述自适应低频补偿包括以下步骤:对所述音频数据执行音调检测,以产生指示所述一组低频带中的每个频带是否具有突出的音调内容的补偿控制数据; 对由该补偿控制数据所表示的具有突出色调内容的低频带组中的每个频带中的音频数据执行低频补偿,而不对该组中的任何其它频带中的音频数据执行低频补偿 的低频带。

    LOW COMPLEXITY REPETITION DETECTION IN MEDIA DATA
    10.
    发明申请
    LOW COMPLEXITY REPETITION DETECTION IN MEDIA DATA 审中-公开
    媒体数据中的低复杂度重复检测

    公开(公告)号:US20140330556A1

    公开(公告)日:2014-11-06

    申请号:US14360257

    申请日:2012-12-10

    CPC classification number: G10L19/00 G10H1/0008

    Abstract: Low complexity detection of a time-wise position of a representative segment in media data is described. A subset of offset values is located in a set of offset values in media data using a first type of one or more types of features, which are extractable from (e.g., derivable from components of) the media data. The subset of offset values comprise values that are selected from the set of offset values based on one or more selection criteria. A set of candidate seed time points is identified based on the subset of offset values using a second type of the one or more types of features.

    Abstract translation: 描述媒体数据中代表性段的时间位置的低复杂度检测。 偏移值的子集位于媒体数据的一组偏移值中,使用可从媒体数据的(例如,可从组件导出)的一个或多个类型的特征的第一类型。 偏移值的子集包括从基于一个或多个选择标准的偏移值集合中选择的值。 使用第二类型的一种或多种类型的特征,基于偏移值的子集来识别一组候选种子时间点。

Patent Agency Ranking