-
公开(公告)号:US20240177486A1
公开(公告)日:2024-05-30
申请号:US18057691
申请日:2022-11-21
Applicant: Lemon Inc. , Beijing Zitiao Network Technology Co., Ltd.
Inventor: Xiaojie JIN , Sen PEI
IPC: G06V20/40 , G06V10/62 , G06V10/774 , G06V10/776 , G06V10/80 , G06V10/82
CPC classification number: G06V20/46 , G06V10/62 , G06V10/774 , G06V10/776 , G06V10/806 , G06V10/82 , G06V20/41
Abstract: Systems and methods are provided that include a processor executing a video classifying program to receive an input video, sample video frames from the input video, extract frame-wise spatial features from the video frames using a convolutional neural network, extract a frame-wise temporal feature for each video frame, aggregate the frame-wise spatial features and the frame-wise temporal feature for each video frame to provide a temporal context to the frame-wise spatial features, input the aggregated frame-wise spatial features and the frame-wise temporal feature for each frame into a transformer encoder to obtain temporal-aware feature representations of the video frames, input the feature representations into a feedforward network model to obtain feedforward-transformed features, obtain a parameter by inputting each feedforward-transformed feature and a set of highlight prototypes into a function comparing the feedforward-transformed features to the set of highlight prototypes, classify the video frames as highlights based on the calculated parameter.