DEEP LEARNING METHOD FOR MULTIPLE OBJECT TRACKING FROM VIDEO

    公开(公告)号:US20240144489A1

    公开(公告)日:2024-05-02

    申请号:US18480127

    申请日:2023-10-03

    申请人: VIETTEL GROUP

    IPC分类号: G06T7/246 G06V10/82 G06V20/40

    摘要: A method for multi-object tracking from video. The method includes the following steps: (1) Capturing frames from the streaming source and preprocess the data; (2) Extract video features with three choices: a 3D-CNN backbone followed by a Transformer Encoder, a Video Transformer Encoder, a 2D-CNN Encoder with a stack of frames as input followed by a Transformer Encoder; (3) Multi-object tracking using a new end-to-end multi-task deep learning model named JDAT (Joint Detection Association Transformer), then post-processing and updating tracking state with Temporal Aggregation Module (TAM). The deep learning models in step 2 and step 3 are trained simultaneously end-to-end with a loss function that is accumulated over multiple timesteps (Collective Average Loss—CAL). Also, the model can be pretrained with weakly labeled image dataset in a self-supervised learning manner first, then finetuned on supervised video datasets with full tracking labels.