AUDIO CODING USING MACHINE LEARNING BASED LINEAR FILTERS AND NON-LINEAR NEURAL SOURCES

    公开(公告)号:US20240428813A1

    公开(公告)日:2024-12-26

    申请号:US18689053

    申请日:2022-10-10

    Abstract: Systems and techniques are described for coding audio signals. For example, a voice decoder can generate, using a first neural network, an excitation signal for at least one sample of an audio signal at least in part by performing a non-linear operation based on one or more inputs to the first neural network, the excitation signal being configured to excite a learned linear filter. The voice decoder can further generate, using the learned linear filter and the excitation signal, at least one sample of a reconstructed audio signal. For example, a second neural network can be used to generate coefficients for one or more learned linear filters, which receive as input the excitation signal generated by the first neural network trained to perform the non-linear operation.

    EFFICIENT PACKET-LOSS PROTECTED DATA ENCODING AND/OR DECODING

    公开(公告)号:US20250090963A1

    公开(公告)日:2025-03-20

    申请号:US18294490

    申请日:2022-09-08

    Abstract: A device includes a memory and one or more processors coupled to the memory and configured to execute instructions from the memory. Execution of the instructions causes the one or more processors to combine two or more data portions to generate input data for a decoder network. A first data portion of the two or more data portions is based on a first encoding of a data sample by a multiple description coding network and content of a second data portion of the two or more data portions depends on whether data based on a second encoding of the data sample by the multiple description coding network is available. Execution of the instructions also causes the one or more processors to obtain, from the decoder network, output data based on the input data and to generate a representation of the data sample based on the output data.

    SYSTEMS AND METHODS FOR MULTI-BAND AUDIO CODING

    公开(公告)号:US20240371384A1

    公开(公告)日:2024-11-07

    申请号:US18689052

    申请日:2022-10-10

    Abstract: Systems and techniques are described for audio coding. An audio system receives feature(s) corresponding an audio signal, for example from an encoder and/or a speech synthesis engine. The audio system generates an excitation signal, such as a harmonic signal and/or a noise signal, based on the feature(s). The audio system uses a filterbank to generate band-specific signals from the excitation signal. The band-specific signals correspond to frequency bands. The audio system inputs the feature(s) into a machine learning (ML) filter estimator to generate parameter(s) associated with linear filter(s). The audio system inputs the feature(s) into a voicing estimator to generate gain value(s). The audio system generates an output audio signal based on modification of the band-specific signals, application of the linear filter(s) according to the parameter(s), and amplification using the gain amplifier(s) according to the gain value(s).

    BUNDLED MULTI-RATE FEEDBACK AUTOENCODER

    公开(公告)号:US20250104723A1

    公开(公告)日:2025-03-27

    申请号:US18728154

    申请日:2023-01-23

    Abstract: A method includes generating an input data state for each data sample in a time series of data samples of a portion of an audio data stream. The method also includes providing at least one input data state to a first bottleneck and at least one other input data state to a second bottleneck. The first bottleneck is associated with a first bitrate and the second bottleneck is associated with a second bitrate. The method further includes generating a first encoded frame based on a first output data state from the first bottleneck and a second encoded frame based on a second output data state from the second bottleneck. The first encoded frame and the second encoded frame are bundled in a packet.

    DIFFUSION-BASED DATA COMPRESSION
    7.
    发明公开

    公开(公告)号:US20240121398A1

    公开(公告)日:2024-04-11

    申请号:US18458006

    申请日:2023-08-29

    CPC classification number: H04N19/137 H04N19/147 H04N19/162

    Abstract: Systems and techniques are described for processing image data using a residual model that can be configured with an adjustable number of sampling steps. For example, a process can include obtaining a latent representation of an image and processing, using a decoder of a machine learning model, the latent representation of the image to generate an initial reconstructed image. The process can further include processing, using the residual model, the initial reconstructed image and noise data to predict a plurality of predictions of a residual over a number of sampling steps. The residual represents a difference between the image and the initial reconstructed image. The process can include obtaining, from the plurality of predictions of the residual, a final residual representing the difference between the image and the initial reconstructed image. The process can further include combining the initial reconstructed image and the residual to generate a final reconstructed image.

    VIDEO COMPRESSION USING RECURRENT-BASED MACHINE LEARNING SYSTEMS

    公开(公告)号:US20210281867A1

    公开(公告)日:2021-09-09

    申请号:US17091570

    申请日:2020-11-06

    Abstract: Techniques are described herein for coding video content using recurrent-based machine learning tools. A device can include a neural network system including encoder and decoder portions. The encoder portion can generate output data for the current time step of operation of the neural network system based on an input video frame for a current time step of operation of the neural network system, reconstructed motion estimation data from a previous time step of operation, reconstructed residual data from the previous time step of operation, and recurrent state data from at least one recurrent layer of a decoder portion of the neural network system from the previous time step of operation. A decoder portion of the neural network system can generate, based on the output data and recurrent state data from the previous time step of operation, a reconstructed video frame for the current time step of operation.

Patent Agency Ranking