EFFICIENT FREQUENCY-BASED AUDIO RESAMPLING FOR USING NEURAL NETWORKS

    公开(公告)号:US20240203443A1

    公开(公告)日:2024-06-20

    申请号:US18068187

    申请日:2022-12-19

    CPC classification number: G10L21/14 G10L21/0232 G10L25/30

    Abstract: Systems and methods described relate to the enhancement of audio, such as through machine learning-based audio super-resolution processing. An efficient resampling approach can be used for audio data received at a lower frequency than is needed for an audio enhancement neural network. This audio data can be converted into the frequency domain using, and once in the frequency domain (e.g., represented using a spectrogram) this lower frequency data can be resampled to provide a frequency-based representation that is at the target input resolution for the neural network. To keep this resampling process lightweight, the upper frequency bands can be padded with zero value entries (or other such padding values). This resampled, higher frequency spectrogram can be provided as input to the neural network, which can perform an enhancement operation such as audio upsampling or super-resolution.

    VIRTUAL AUDIO AUGMENTATION USING COMPUTER VISION

    公开(公告)号:US20250126429A1

    公开(公告)日:2025-04-17

    申请号:US18379582

    申请日:2023-10-12

    Abstract: Disclosed are apparatuses, systems, and techniques that provide virtual immersion sound experience and spatialization effects with an audio device supporting a low number of sound channels, according to at least one embodiment. The techniques include but are not limited to associating input audio channels of an audio stream with virtual speakers, identifying, using an optical sensor, positioning of a user's head relative to the virtual speakers, determining simulated sound intensities at one or more reference locations associated with the user's head, and generating, based on the simulated sound intensities, output audio signals configured for physical speakers.

Patent Agency Ranking