Audio processing method and audio processing apparatus
    11.
    发明授权
    Audio processing method and audio processing apparatus 有权
    音频处理方法和音频处理装置

    公开(公告)号:US09282419B2

    公开(公告)日:2016-03-08

    申请号:US14365072

    申请日:2012-12-12

    CPC classification number: H04S5/00 G10L19/26 G10L21/0364 H04S7/302

    Abstract: An audio processing method and an audio processing apparatus are described. A mono-channel audio signal is transformed into a plurality of first subband signals. Proportions of a desired component and a noise component are estimated in each of the subband signals. Second subband signals corresponding respectively to a plurality of channels are generated from each of the first subband signals. Each of the second subband signals comprises a first component and a second component obtained by assigning a spatial hearing property and a perceptual hearing property different from the spatial hearing property to the desired component and the noise component in the corresponding first subband signal respectively, based on a multi-dimensional auditory presentation method. The second subband signals are transformed into signals for rendering with the multi-dimensional auditory presentation method. By assigning different hearing properties to desired sound and noise, the intelligibility of the audio signal can be improved.

    Abstract translation: 描述了音频处理方法和音频处理装置。 单声道音频信号被变换成多个第一子带信号。 在每个子带信号中估计所需分量和噪声分量的比例。 从第一子带信号中的每一个产生分别对应于多个信道的第二子带信号。 每个第二子带信号包括第一分量和第二分量,该第一分量和第二分量通过分别将空间听觉特性和不同于空间听觉特性的感知听觉特性分配给相应的第一子带信号中的期望分量和噪声分量,基于 多维听觉呈现方法。 第二子带信号被转换成用于用多维听觉呈现方法渲染的信号。 通过将不同的听觉属性分配给期望的声音和噪声,可以提高音频信号的可懂度。

    HARMONICITY ESTIMATION, AUDIO CLASSIFICATION, PITCH DETERMINATION AND NOISE ESTIMATION
    12.
    发明申请
    HARMONICITY ESTIMATION, AUDIO CLASSIFICATION, PITCH DETERMINATION AND NOISE ESTIMATION 有权
    谐波估计,音频分类,判定和噪声估计

    公开(公告)号:US20150081283A1

    公开(公告)日:2015-03-19

    申请号:US14384356

    申请日:2013-03-21

    CPC classification number: G10L25/78 G10L25/18 G10L25/81 G10L25/84

    Abstract: Embodiments are described for harmonicity estimation, audio classification, pitch determination and noise estimation. Measuring harmonicity of an audio signal includes calculation a log amplitude spectrum of audio signal. A first spectrum is derived by calculating each component of the first spectrum as a sum of components of the log amplitude spectrum on frequencies. In linear frequency scale, the frequencies are odd multiples of the component's frequency of the first spectrum. A second spectrum is derived by calculating each component of the second spectrum as a sum of components of the log amplitude spectrum on frequencies. In linear frequency scale, the frequencies are even multiples of the component's frequency of the second spectrum. A difference spectrum is derived subtracting the first spectrum from the second spectrum. A measure of harmonicity is generated as a monotonically increasing function of the maximum component of the difference spectrum within predetermined frequency range.

    Abstract translation: 描述了用于谐波估计,音频分类,音调确定和噪声估计的实施例。 测量音频信号的谐波包括计算音频信号的对数幅度谱。 通过将第一频谱的每个分量计算为频率上的对数幅度谱的分量的和来导出第一频谱。 在线性频率标度中,频率是第一个频谱的分量频率的奇数倍。 通过将第二频谱的每个分量计算为频率上的对数幅度谱的分量的和来导出第二频谱。 在线性频率标度中,频率是第二个频谱的分量频率的偶数倍。 导出从第二个频谱减去第一个频谱的差分谱。 产生谐波度的度量作为预定频率范围内的差分频谱的最大分量的单调递增函数。

    METHOD AND APPARATUS FOR AUDIO PROCESSING USING A CONVOLUTIONAL NEURAL NETWORK ARCHITECTURE

    公开(公告)号:US20230401429A1

    公开(公告)日:2023-12-14

    申请号:US18032322

    申请日:2021-10-19

    CPC classification number: G06N3/0464 G10L21/00

    Abstract: Systems, methods, and computer program products for audio processing based on convolutional neural network (CNN) are described. A first CNN architecture may comprise a contracting path of a U-net, a multi-scale CNN, and an expansive path of a U-net. The contracting path may comprise a first encoding layer and may be configured to generate an output representation of the contracting path. The multi-scale CNN may be configured to generate, based on the output representation of the contracting path, an intermediate representation. The multi-scale CNN may comprise at least two parallel convolution paths. The expansive path may comprise a first decoding layer and may be configured to generate a final representation based on the intermediate representation generated by the multi-scale CNN. Within a second CNN architecture, the first encoding layer may comprise a first multi-scale CNN with at least two parallel convolution paths, and the first decoding layer may comprise a second multi-scale CNN with at least two parallel convolution paths.

    Audio Processing Method and Audio Processing Apparatus
    17.
    发明申请
    Audio Processing Method and Audio Processing Apparatus 有权
    音频处理方法和音频处理装置

    公开(公告)号:US20150071446A1

    公开(公告)日:2015-03-12

    申请号:US14365072

    申请日:2012-12-12

    CPC classification number: H04S5/00 G10L19/26 G10L21/0364 H04S7/302

    Abstract: An audio processing method and an audio processing apparatus are described. A mono-channel audio signal is transformed into a plurality of first subband signals. Proportions of a desired component and a noise component are estimated in each of the subband signals. Second subband signals corresponding respectively to a plurality of channels are generated from each of the first subband signals. Each of the second subband signals comprises a first component and a second component obtained by assigning a spatial hearing property and a perceptual hearing property different from the spatial hearing property to the desired component and the noise component in the corresponding first subband signal respectively, based on a multi-dimensional auditory presentation method. The second subband signals are transformed into signals for rendering with the multi-dimensional auditory presentation method. By assigning different hearing properties to desired sound and noise, the intelligibility of the audio signal can be improved.

    Abstract translation: 描述了音频处理方法和音频处理装置。 单声道音频信号被变换成多个第一子带信号。 在每个子带信号中估计所需分量和噪声分量的比例。 从第一子带信号中的每一个产生分别对应于多个信道的第二子带信号。 每个第二子带信号包括第一分量和第二分量,该第一分量和第二分量通过分别将空间听觉特性和不同于空间听觉特性的感知听觉特性分配给相应的第一子带信号中的期望分量和噪声分量,基于 多维听觉呈现方法。 第二子带信号被转换成用于用多维听觉呈现方法渲染的信号。 通过将不同的听觉属性分配给期望的声音和噪声,可以提高音频信号的可懂度。

    METHOD AND SYSTEM FOR SIGNAL TRANSMISSION CONTROL
    18.
    发明申请
    METHOD AND SYSTEM FOR SIGNAL TRANSMISSION CONTROL 有权
    信号传输控制方法与系统

    公开(公告)号:US20150032446A1

    公开(公告)日:2015-01-29

    申请号:US14382667

    申请日:2013-03-21

    CPC classification number: G10L25/84 G10L25/78 G10L2025/783

    Abstract: An audio signal with a temporal sequence of blocks or frames is received or accessed. Features are determined as characterizing aggregately the sequential audio blocks/frames that have been processed recently, relative to current time. The feature determination exceeds a specificity criterion and is delayed, relative to the recently processed audio blocks/frames. Voice activity indication is detected in the audio signal. VAD is based on a decision that exceeds a preset sensitivity threshold and is computed over a brief time period, relative to blocks/frames duration, and relates to current block/frame features. The VAD and the recent feature determination are combined with state related information, which is based on a history of previous feature determinations that are compiled from multiple features, determined over a time prior to the recent feature determination time period. Decisions to commence or terminate the audio signal, or related gains, are outputted based on the combination.

    Abstract translation: 具有块或帧的时间序列的音频信号被接收或访问。 确定特征是综合表征最近相对于当前时间最近处理的顺序音频块/帧。 相对于最近处理的音频块/帧,特征确定超过特定性标准并被延迟。 在音频信号中检测到语音活动指示。 VAD基于超过预设灵敏度阈值的决定,并且相对于块/帧持续时间在短时间段内计算,并且涉及当前块/帧特征。 VAD和最近的特征确定与状态相关信息相结合,状态相关信息基于在最近的特征确定时间段之前的时间确定的从多个特征编译的先前特征确定的历史。 基于该组合输出开始或终止音频信号或相关增益的决定。

    Steering of binauralization of audio

    公开(公告)号:US11895479B2

    公开(公告)日:2024-02-06

    申请号:US17637446

    申请日:2020-08-19

    CPC classification number: H04S7/30 H04S2420/01

    Abstract: A method for steering binauralization of audio is provided. The method comprises steps of: receiving (410) an audio input signal, calculating (430) a confidence value indicating a likelihood that a current audio frame of the audio input signal comprises binauralized audio; determining (450) a state signal based on the confidence value; determining (460) a steering signal, based on the first confidence value, the state signal and an energy value of the audio frame; and generating (470) an audio output signal with steered binauralization by processing the audio input signal according to the steering signal.

    METHOD AND DEVICE FOR PROCESSING A BINAURAL RECORDING

    公开(公告)号:US20230360662A1

    公开(公告)日:2023-11-09

    申请号:US18026281

    申请日:2021-09-15

    CPC classification number: G10L21/0208 H04S1/007 G10L2021/02166

    Abstract: The present invention relates to a method and device for processing a first and a second audio signal representing an input binaural audio signal acquired by a binaural recording device. The present invention further relates to a method for rendering a binaural audio signal on a speaker system. The method for processing a binaural signal comprising extracting audio information from the first audio signal, computing a band gain for reducing noise in the first audio signal and applying the band gains to respective frequency bands of the first audio signal in accordance with a dynamic scaling factor, to provide a first output audio signal. Wherein the dynamic scaling factor has a value between zero and one and is selected so as to reduce quality degradation for the first audio signal.

Patent Agency Ranking