Invention Grant
- Patent Title: Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction
-
Application No.: US15794758Application Date: 2017-10-26
-
Publication No.: US10366292B2Publication Date: 2019-07-30
- Inventor: Renqiang Min , Yunchen Pu
- Applicant: NEC Laboratories America, Inc.
- Applicant Address: JP
- Assignee: NEC Corporation
- Current Assignee: NEC Corporation
- Current Assignee Address: JP
- Agent Joseph Kolodka
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06K9/46 ; G06N3/04 ; G06K9/66 ; H04N5/278 ; G06K9/62 ; H04N21/218 ; H04N21/234 ; H04N21/488 ; G06K9/72 ; H04N7/18

Abstract:
A system is provided for video captioning. The system includes a processor. The processor is configured to apply a three-dimensional Convolutional Neural Network (C3D) to image frames of a video sequence to obtain, for the video sequence, (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of an output caption for the video sequence by applying the top-layer features to a Long Short Term Memory (LSTM). The processor is further configured to produce subsequent words of the output caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the output caption, and a hidden state of the LSTM. The system further includes a display device for displaying the output caption to a user.
Public/Granted literature
Information query