基于深度双Q网络强化学习的UUV实时避碰规划方法

发明公开

CN110716575A 基于深度双Q网络强化学习的UUV实时避碰规划方法审中-实审

请登陆查看更多内容

专利标题： 基于深度双Q网络强化学习的UUV实时避碰规划方法
专利标题（英）： UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning
申请号： CN201910934538.2

申请日： 2019-09-29
公开(公告)号： CN110716575A

公开(公告)日： 2020-01-21
发明人: 王宏健 , 袁建亚 , 严浙平 , 贺巨义 , 刘超伟 , 牛韶源
申请人： 哈尔滨工程大学
申请人地址： 黑龙江省哈尔滨市南岗区南通大街145号哈尔滨工程大学科技处知识产权办公室
专利权人： 哈尔滨工程大学
当前专利权人： 哈尔滨工程大学
当前专利权人地址： 黑龙江省哈尔滨市南岗区南通大街145号哈尔滨工程大学科技处知识产权办公室
主分类号： G05D1/06
IPC分类号： G05D1/06

摘要：

本发明属于UUV控制技术领域，具体涉及一种基于深度双Q网络强化学习的UUV实时避碰规划方法。本发明提供了一种适用于UUV工作环境和感知特点的自主避碰规划方法，通过UUV与环境之间不断地试错交互，利用成功或失败的经验，产生奖励或惩罚的信号不断地改进UUV的策略，让UUV具有自我学习的能力。本发明使网络体系在复杂环境的局部避碰规划时具有自我学习的能力实现端到端模型，通过直接从原始数据集上学习状态与动作的映射关系，将深度学习和强化学习相结合应用到避碰规划问题的解决上。本发明使用深度强化学习，不会因为路径过于复杂而无法执行策略，使其在实际应用中缩短了项目的开发周期、实施更加简洁、高效、鲁棒性高。

摘要（英）：

The invention belongs to the technical field of UUV control, and specifically relates to a UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning. Theinvention provides an autonomous collision avoidance planning method suitable for the working environment and the perception characteristics of a UUV (unmanned Underwater vehicle). A strategy of the UUV is continuously improved by performing continuous trial and error interaction between the UUV and the environment and generating a reward or punishment signal by using successful or failed experience, so that the UUV has the self-learning capability. Through the method disclosed by the invention, a network system has self-learning capability in the local collision avoidance planning of the complex environment to realize an end-to-end model, and the deep learning and the reinforcement learning are combined to be applied to the solution of the collision avoidance planning problem by directlylearning a mapping relation of a state and the action from an original data set. By using the deep reinforcement learning, the condition that the strategy cannot be executed due to complex path can beavoided, the development period of the project is shortened in the actual application, the implementation is more concise, efficient, and high in robustness.

信息查询

中国专利公布公告

审查信息

Global Dossier

Espacenet

IPC分类:

G	物理
G05	控制；调节
G05D	非电变量的控制或调节系统（金属的连续铸造入B22D11/16；阀门本身入F16K；非电变量的检测见G01各有关小类；电或磁变量的调节入G05F）
G05D1/00	陆地、水上、空中或太空中的运载工具的位置、航道、高度或姿态的控制，例如自动驾驶仪（无线电导航系统或使用其他波的类似系统入G01S）
G05D1/04	.高度或深度的控制
G05D1/06	..高度或深度的变化率