-
公开(公告)号:US20190297326A1
公开(公告)日:2019-09-26
申请号:US16360853
申请日:2019-03-21
Applicant: NVIDIA Corporation
Inventor: Fitsum A. Reda , Guilin Liu , Kevin Shih , Robert Kirby , Jonathan Barker , David Tarjan , Andrew Tao , Bryan Catanzaro
IPC: H04N19/139 , G06N3/08 , G06N20/10 , G06N3/04 , G06N20/20 , H04N19/587 , H04N19/132 , H04N19/172
Abstract: A neural network architecture is disclosed for performing video frame prediction using a sequence of video frames and corresponding pairwise optical flows. The neural network processes the sequence of video frames and optical flows utilizing three-dimensional convolution operations, where time (or multiple video frames in the sequence of video frames) provides the third dimension in addition to the two-dimensional pixel space of the video frames. The neural network generates a set of parameters used to predict a next video frame in the sequence of video frames by sampling a previous video frame utilizing spatially-displaced convolution operations. In one embodiment, the set of parameters includes a displacement vector and at least one convolution kernel per pixel. Generating a pixel value in the next video frame includes applying the convolution kernel to a corresponding patch of pixels in the previous video frame based on the displacement vector.