-
公开(公告)号:US20210081672A1
公开(公告)日:2021-03-18
申请号:US17016240
申请日:2020-09-09
Applicant: NEC Laboratories America, Inc.
Inventor: Asim KADAV , Farley LAI , Chhavi SHARMA
Abstract: Aspects of the present disclosure describe systems, methods and structures including a network that recognizes action(s) from learned relationship(s) between various objects in video(s). Interaction(s) of objects over space and time is learned from a series of frames of the video. Object-like representations are learned directly from various 2D CNN layers by capturing the 2D CNN channels, resizing them to an appropriate dimension and then providing them to a transformer network that learns higher-order relationship(s) between them. To effectively learn object-like representations, we 1) combine channels from a first and last convolutional layer in the 2D CNN, and 2) optionally cluster the channel (feature map) representations so that channels representing the same object type are grouped together.