METHODS, SYSTEMS, AND MEDIA FOR COMPUTER VISION USING 2D CONVOLUTION OF 4D VIDEO DATA TENSORS

    公开(公告)号:WO2023061465A1

    公开(公告)日:2023-04-20

    申请号:PCT/CN2022/125299

    申请日:2022-10-14

    Abstract: Methods, systems and media for computer vision using 2D convolution of 4D video data tensors are described. 3D convolution operations performed on 5D input tensors are simulated by performing 2D convolution of 4D tensors instead. A convolution block of a CNN performs two parallel operations: a spatial processing branch performs spatial feature extraction on a 4D tensor using 2D convolution, whereas a temporal processing branch performs temporal feature extraction on a different 4D tensor using 2D convolution. The output tensors of the spatial processing branch and the temporal processing branch are combined to generate an output tensor of the convolution block. The convolution block may include additional operations such as reshaping and/or further convolution operations to generate identically-sized output tensors for each branch, thereby eliminating the need for post- processing of the branches' output tensors prior to combining them.

Patent Agency Ranking