视频描述方法、系统及装置

发明公开

CN110019952A 视频描述方法、系统及装置有权

请登陆查看更多内容

专利标题： 视频描述方法、系统及装置
专利标题（英）： Video description method, system and device
申请号： CN201710940199.X

申请日： 2017-09-30
公开(公告)号： CN110019952A

公开(公告)日： 2019-07-16
发明人: 蔡海军 , 陈院林 , 王亮 , 王威
申请人： 华为技术有限公司 , 中国科学院自动化研究所
申请人地址： 广东省深圳市龙岗区坂田华为总部办公楼
专利权人： 华为技术有限公司,中国科学院自动化研究所
当前专利权人： 华为技术有限公司,中国科学院自动化研究所
当前专利权人地址： 广东省深圳市龙岗区坂田华为总部办公楼
代理机构： 广州三环专利商标代理有限公司
代理商 郝传鑫; 熊永强
主分类号： G06F16/738
IPC分类号： G06F16/738 ; G06K9/00 ; G06K9/62 ; G06N3/04

摘要：

本发明实施例提供了一种视频描述方法、系统及装置，其中，该方法可以利用基于卷积神经网络的视频编码器提取待描述视频中当前时刻视频帧的视觉特征表示；将当前时刻的视觉特征表示写入到当前时刻的视觉记忆存储器中；根据当前时刻的视觉记忆存储器和当前时刻的文本记忆存储器从当前时刻的属性记忆存储器中读取属性信息；利用基于长短时记忆网络的文本解码器根据上一时刻单词和当前时刻读取的属性信息生成预测的单词。可见，该实施例采用多模态描述方法，有助于增加视频描述的灵活性。

摘要（英）：

The embodiment of the invention provides a video description method, system and device, and the method comprises the steps: extracting the visual feature representation of a video frame at the currentmoment in a to-be-described video through a video encoder based on a convolutional neural network; writing the visual feature representation at the current moment into a visual memory at the currentmoment; reading attribute information from the attribute memory at the current moment according to the visual memory at the current moment and the text memory at the current moment; and generating predicted words by using a text decoder based on the long-short term memory network according to the word at the previous moment and the attribute information read at the current moment. Therefore, according to the embodiment, a multi-mode description method is adopted, and the flexibility of video description can be improved.

公开/授权文献

CN110019952B 视频描述方法、系统及装置公开/授权日：2023-04-18

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/70	.•视频数据
G06F16/73	..••查询
G06F16/738	...•••查询结果的可视化