- 专利标题: Jointly modeling embedding and translation to bridge video and language
-
申请号: US14946988申请日: 2015-11-20
-
公开(公告)号: US09807473B2公开(公告)日: 2017-10-31
- 发明人: Tao Mei , Ting Yao , Yong Rui
- 申请人: Microsoft Technology Licensing, LLC
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人地址: US WA Redmond
- 主分类号: H04N5/445
- IPC分类号: H04N5/445 ; H04N21/8405 ; G06F17/27 ; G06K9/00 ; G06N3/08
摘要:
Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
公开/授权文献
信息查询