Invention Application
- Patent Title: Jointly Modeling Embedding and Translation to Bridge Video and Language
-
Application No.: US14946988Application Date: 2015-11-20
-
Publication No.: US20170150235A1Publication Date: 2017-05-25
- Inventor: Tao Mei , Ting Yao , Yong Rui
- Applicant: Microsoft Technology Licensing, LLC
- Main IPC: H04N21/8405
- IPC: H04N21/8405 ; G06K9/00 ; G06N3/08 ; G06F17/27

Abstract:
Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
Public/Granted literature
- US09807473B2 Jointly modeling embedding and translation to bridge video and language Public/Granted day:2017-10-31
Information query