一种基于深度强化学习的固定翼无人机群集控制方法

发明公开

CN110502034A 一种基于深度强化学习的固定翼无人机群集控制方法有权

请登陆查看更多内容

专利标题： 一种基于深度强化学习的固定翼无人机群集控制方法
专利标题（英）： Fixed-wing UAV(unmanned aerial vehicle) cluster control method based on deep reinforcement learning
申请号： CN201910832120.0

申请日： 2019-09-04
公开(公告)号： CN110502034A

公开(公告)日： 2019-11-26
发明人: 闫超 , 相晓嘉 , 王菖 , 牛轶峰 , 尹栋 , 吴立珍 , 陈紫叶
申请人： 中国人民解放军国防科技大学
申请人地址： 湖南省长沙市开福区砚瓦池正街47号
专利权人： 中国人民解放军国防科技大学
当前专利权人： 中国人民解放军国防科技大学
当前专利权人地址： 湖南省长沙市开福区砚瓦池正街47号
代理机构： 湖南兆弘专利事务所
代理商 周长清
主分类号： G05D1/10
IPC分类号： G05D1/10

摘要：

本发明公开了一种基于深度强化学习的固定翼无人机群集控制方法，其步骤包括：步骤S1、离线训练阶段：建立随机无人机动力学模型，基于竞争双重Q网络的Q函数评估之后，进行动作选择；所述竞争双重Q网络为D3QN网络；步骤S2、在线执行阶段：构建竞争双重Q网络，并载入训练好的网络模型，所述网络模型和动作选择策略运行在僚机的机载电脑上，长机滚转动作由操控员给出，长机和僚机的自驾仪分别根据各自的滚转动作，直至完成飞行任务。本发明具有较强的实时性和适应性，能够将仿真中训练得到的策略迁移到真实环境等优点。

摘要（英）：

The invention discloses a fixed-wing UAV (unmanned aerial vehicle) cluster control method based on deep reinforcement learning. The method comprises the following steps: S1) offline training phase: establishing a stochastic UAV dynamical model, and after Q function evaluation based on a dueling double deep Q-network, performing action selection, wherein the dueling double deep Q-network is a D3QNnetwork; and S2) online execution phase: constructing the dueling double deep Q-network, and loading a trained network model, the network model and an action selection strategy operating on an onboardcomputer of a wing plane, a lead plane roll action being given by an operator, and autopilots of the lead plane and the wing plane realizing closed-loop control according to rolling actions respectively and repeating the steps until the flight mission is completed. The method has the advantages of high real-time performance and adaptability, and being capable of transferring the strategy obtainedby training in simulation to the real environment and the like.

公开/授权文献

CN110502034B 一种基于深度强化学习的固定翼无人机群集控制方法公开/授权日：2022-08-09

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G05	控制；调节
G05D	非电变量的控制或调节系统（金属的连续铸造入B22D11/16；阀门本身入F16K；非电变量的检测见G01各有关小类；电或磁变量的调节入G05F）
G05D1/00	陆地、水上、空中或太空中的运载工具的位置、航道、高度或姿态的控制，例如自动驾驶仪（无线电导航系统或使用其他波的类似系统入G01S）
G05D1/10	.三维的位置或航道的同时控制（G05D1/12优先）