NEURAL NETWORK WITH TRANSFORMER BASED VIDEO CODING TOOL

    公开(公告)号:US20250119556A1

    公开(公告)日:2025-04-10

    申请号:US18889977

    申请日:2024-09-19

    Abstract: A method of processing video data includes receiving a picture; and filtering a current block of the picture, through a neural network and based on local correlations of proximate samples and distant, non-local correlations of non-proximate samples relative to the current block, to generate a filtered current block. The neural network comprises one or more backbone blocks and one or more transformer blocks. Each of the one or more transformer blocks is associated with a backbone block of the one or more backbone blocks. At least one of the backbone blocks is configured to capture the local correlations, relative to the current block and the proximate samples of the current block, and at least one of the transformer blocks is configured to generate features, based on applying an attention mechanism, that capture the distant, non-local correlations, relative to the current block and the non-proximate samples, in the picture for processing.

    CONVENTIONAL AND NEURAL NETWORK CODECS FOR RANDOM ACCESS VIDEO CODING

    公开(公告)号:US20250016339A1

    公开(公告)日:2025-01-09

    申请号:US18744171

    申请日:2024-06-14

    Abstract: An example device for decoding video data includes a processing system comprising one or more processors implemented in circuitry and configured to: determine that a first temporal layer identifier of a first picture of the video data is included in a first set of temporal layers; in response to the first temporal layer identifier being included in the first set of temporal layers, decode blocks of the first picture on a block by block basis; determine that a second temporal layer identifier of a second picture of the video data is included in a second set of temporal layers, the second set of temporal layers being higher than the first set of temporal layers; and in response to the second temporal layer identifier being included in the second set of temporal layers, execute a neural network-based video decoder to decode the second picture.

Patent Agency Ranking