SPATIO-TEMPORAL INTERACTIONS FOR VIDEO UNDERSTANDING

    公开(公告)号:US20210081672A1

    公开(公告)日:2021-03-18

    申请号:US17016240

    申请日:2020-09-09

    Abstract: Aspects of the present disclosure describe systems, methods and structures including a network that recognizes action(s) from learned relationship(s) between various objects in video(s). Interaction(s) of objects over space and time is learned from a series of frames of the video. Object-like representations are learned directly from various 2D CNN layers by capturing the 2D CNN channels, resizing them to an appropriate dimension and then providing them to a transformer network that learns higher-order relationship(s) between them. To effectively learn object-like representations, we 1) combine channels from a first and last convolutional layer in the 2D CNN, and 2) optionally cluster the channel (feature map) representations so that channels representing the same object type are grouped together.

Patent Agency Ranking