Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction

Invention Grant

US10366292B2 Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction 有权

Please log in to see more content

Patent Title: Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction
Application No.: US15794758

Application Date: 2017-10-26
Publication No.: US10366292B2

Publication Date: 2019-07-30
Inventor: Renqiang Min , Yunchen Pu
Applicant: NEC Laboratories America, Inc.
Applicant Address: JP
Assignee: NEC Corporation
Current Assignee: NEC Corporation
Current Assignee Address: JP
Agent Joseph Kolodka
Main IPC: G06K9/00
IPC: G06K9/00 ; G06K9/46 ; G06N3/04 ; G06K9/66 ; H04N5/278 ; G06K9/62 ; H04N21/218 ; H04N21/234 ; H04N21/488 ; G06K9/72 ; H04N7/18

Abstract:

A system is provided for video captioning. The system includes a processor. The processor is configured to apply a three-dimensional Convolutional Neural Network (C3D) to image frames of a video sequence to obtain, for the video sequence, (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of an output caption for the video sequence by applying the top-layer features to a Long Short Term Memory (LSTM). The processor is further configured to produce subsequent words of the output caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the output caption, and a hidden state of the LSTM. The system further includes a display device for displaying the output caption to a user.

Public/Granted literature

US20180121734A1 TRANSLATING VIDEO TO LANGUAGE USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION Public/Granted day:2018-05-03

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )