Invention Grant
- Patent Title: Jointly modeling embedding and translation to bridge video and language
-
Application No.: US14946988Application Date: 2015-11-20
-
Publication No.: US09807473B2Publication Date: 2017-10-31
- Inventor: Tao Mei , Ting Yao , Yong Rui
- Applicant: Microsoft Technology Licensing, LLC
- Applicant Address: US WA Redmond
- Assignee: Microsoft Technology Licensing, LLC
- Current Assignee: Microsoft Technology Licensing, LLC
- Current Assignee Address: US WA Redmond
- Main IPC: H04N5/445
- IPC: H04N5/445 ; H04N21/8405 ; G06F17/27 ; G06K9/00 ; G06N3/08

Abstract:
Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
Public/Granted literature
- US20170150235A1 Jointly Modeling Embedding and Translation to Bridge Video and Language Public/Granted day:2017-05-25
Information query