一种视频描述语句生成方法及系统

发明公开

请登陆查看更多内容

专利标题： 一种视频描述语句生成方法及系统
专利标题（英）： Method and system for generating video description sentences
申请号： CN201610270084.X

申请日： 2016-04-27
公开(公告)号： CN105894043A

公开(公告)日： 2016-08-24
发明人: 郭大山 , 刘幸偕 , 方向忠 , 阮志强 , 徐宁 , 张芩 , 方大为 , 江勤勇 , 吴泳江 , 吴轶峰 , 祝晓清 , 孙哲 , 孔申勇 , 高原 , 杨哲峰
申请人： 上海高智科技发展有限公司 , 上海高智特种车有限公司 , 上海高智通信研究院有限公司 , 上海高智网络股份有限公司
申请人地址： 上海市徐汇区钦江路283号
专利权人： 上海高智科技发展有限公司,上海高智特种车有限公司,上海高智通信研究院有限公司,上海高智网络股份有限公司
当前专利权人： 上海高智科技发展有限公司,上海高智特种车有限公司,上海高智通信研究院有限公司,上海高智网络股份有限公司
当前专利权人地址： 上海市徐汇区钦江路283号
代理机构： 上海光华专利事务所
代理商 王再朝
主分类号： G06K9/62
IPC分类号： G06K9/62

摘要：

本发明提供一种视频描述语句生成方法及系统，对获取的视频流通过卷积神经网络进行处理，以获得与所述视频流的各帧分别对应的特征向量；根据各所述特征向量，计算相邻帧的相关性，并据以对所述视频流的所有帧进行聚类处理，以将所述视频流对应的所有帧划分为多个类；将所述聚类结果输入循环神经网络，以获得与所述视频流对应的描述语句。本发明创造性加入聚类这一过程，可以更好地对不同时间粒度的局部信息进行描述，防止采用等间隔采样不可避免会丢失部分局部信息的问题，即在减少局部信息丢失的前提下提升处理速度。所以，本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。

摘要（英）：

The invention provides a method and system for generating video description sentences. The method includes the steps of processing a video flow acquired through a convolution neural network, to obtain a feature vector corresponding to each frame of the video flow; calculating the correlation of the adjacent frames according to the feature vectors, and on this basis, conducting clustering processing for all the frames of the video flow, so as to dividing all the frames corresponding to the video flow into a plurality of categories; and inputting clustering results into a recurrent neural network to obtain description sentences corresponding to the video flow. The invention creatively adds the clustering process, can better describe local information of time granularities, and prevents the problem of the inevitable loss of the partial local information using equal interval sampling, that is the processing speed is increased under the premise of reducing the loss of the local information. Therefore, the invention effectively overcomes the drawbacks of the prior art and has a high industrial utilization value.

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置