Detecting objects in a video using attention models
Abstract:
The present disclosure describes techniques of detecting objects in a video. The techniques comprises extracting features from each frame of the video; generating a first attentive feature by applying a first attention model on at least some of features extracted from any particular frame among the plurality of frames, wherein the first attention model identifies correlations between a plurality of locations in the particular frame by computing relationships between any two locations among the plurality of locations; generating a second attentive feature by applying a second attention model on at least one pair of features at different levels selected from the features extracted from the particular frame, wherein the second attention model identifies a correlation between at least one pair of locations corresponding to the at least one pair of features; and generating a representation of an object included in the particular frame.
Public/Granted literature
Information query
Patent Agency Ranking
0/0