- 无标题文档
查看论文信息

中文题名:

 多无人机协同避障算法研究    

姓名:

 李瑜    

学号:

 20181214293    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085401    

学科名称:

 工学 - 电子信息 - 新一代电子信息技术(含量子技术等)    

学生类型:

 硕士    

学位:

 电子信息硕士    

学校:

 西安电子科技大学    

院系:

 广州研究院    

专业:

 电子信息    

研究方向:

 强化学习    

第一导师姓名:

 张文博    

第一导师单位:

 西安电子科技大学    

第二导师姓名:

 常超    

完成日期:

 2023-04-10    

答辩日期:

 2023-05-30    

外文题名:

 Multi-UAV Cooperative Obstacle Avoidance Algorithm Research    

中文关键词:

 改进人工势场 ; 强化学习 ; 多传感器 ; 无人机 ; 避障    

外文关键词:

 Improving artificial potential fields ; Reinforcement learning ; Multi-sensor ; Unmanned Aerial Vehicles ; Avoiding obstacles    

中文摘要:

随着传感器与通信链路的不断升级,无人机将逐步实现小型化、智能化、集群化, 多无人机协同自主完成任务已是大势所趋。然而无人机集群在低空执行任务时会面临很高风险,由于任务环境存在未知性和复杂性,无人机若没有及时发现障碍物并做出相应调整,很可能发生撞毁事故。由于无人机集群为群体作战模式,因此还要考虑在执行任务的过程中无人机之间如何避免碰撞。本文分别研究了经典理论控制算法和强化学习控制算法在多无人机协同避障领域的应用,旨在解决多无人机在飞行时障碍物避碰和机间避碰问题,主要内容概括如下:

针对传统人工势场法应用于多无人机飞行避障方面的缺陷进行理论分析和仿真。首先,提出以目标点为中心的障碍物斥力衰减规则,保证无人机在接近目标点时受到的障碍物斥力逐渐衰减,解决了目标不可达问题。其次,采取限制飞机飞行轨迹角度更新范围的方式解决飞机轨迹会局部震荡的问题,优化了无人机的飞行轨迹。最后,建立基于无人机机间安全距离的斥力场和引力场,解决了多无人机的机间碰撞问题。

针对无人机在真实飞行时不能有效利用环境信息指导强化学习网络做决策的问题,提出了多维度信息融合的网络框架,包括三维图像信息、一维的雷达信息以及无人机自身状态信息。根据无人机在真实飞行避障过程中可能遇到的状况设计了相对真实的避障环境以及奖励函数体系,最后基于 ROS 和 Gazebo 仿真软件建立无人机飞行场景,基于强化学习实现了无人机飞行避障实验,成功达到目标点。采用多特征融合方法、激光雷达方法和摄像头方法进行强化学习训练并对比分析,实验结果表明本文提出的多特征融合方法训练出的策略网络可指导无人机决策出更有价值的动作。

基于上述提出的无人机避障模型进行优化,使其适用多无人机避障任务,在避障模型中添加了多无人机机间避障,针对基于距离的机间避障方法存在的缺点提出改进距离的机间避障方法,并提出基于范围的机间避障方法。对改进距离的机间避障方法和基于范围的机间避障方法进行多智能体强化学习训练并对比分析,使用改进距离的机间避障方法会使得无人机之间距离更紧凑,使用基于范围的机间避障方法在大规模无人机飞行避障中更有效。Gazebo 仿真结果表明本文提出的算法框架可有效完成多无人机协同避障任务。

外文摘要:

With the continuous upgrading of sensors and communication links, UAVs will gradually be miniaturised, intelligent and clustered, and it is a trend for multiple UAVs to work together to complete missions autonomously. However, UAV clusters face high risks when carrying out missions at low altitude. Due to the unknown nature and complexity of the mission environment, UAVs are likely to crash if they do not detect obstacles and make corresponding adjustments in time. As the UAV cluster is a group combat mode, it is also important to consider how to avoid collisions between UAVs during the mission. This paper investigates the application of classical theoretical control algorithms and reinforcement learning control algorithms in the field of cooperative obstacle avoidance for multiple UAVs respectively, aiming to solve the problems of obstacle avoidance and inter-aircraft collision avoidance during the flight of multiple UAVs, which are summarised as follows:

 

Theoretical analysis and simulation are conducted to address the shortcomings of the traditional artificial potential field method applied to multiple UAV flight obstacle avoidance. Firstly, an obstacle repulsive force decay rule centred on the target point is proposed to ensure that the obstacle repulsive force on the UAV gradually decays as it approaches the target point, solving the target unreachability problem. Secondly, we adopt the approach of limiting the angle update range of the aircraft trajectory to solve the problem that the aircraft trajectory will be locally oscillated, and optimise the flight trajectory of the UAV. Finally, the repulsive and gravitational fields based on the safe distance between UAVs are established to solve the inter-aircraft collision problem of multiple UAVs.

 

To address the problem that UAVs cannot effectively use environmental information to guide the reinforcement learning network to make decisions during real flight, a multi-dimensional information fusion network framework is proposed, including three-dimensional image information, one-dimensional radar information and the UAV's own state information. A relatively realistic obstacle avoidance environment and a reward function system are designed based on the conditions that a UAV may encounter during real flight obstacle avoidance. Finally, a UAV flight scenario is established based on ROS and Gazebo simulation software, and a UAV flight obstacle avoidance experiment is implemented based on reinforcement learning, which successfully reaches the target point. The experimental results show that the policy network trained by the proposed multi-feature fusion method can guide the UAV to make more valuable decisions.

 

Based on the above proposed UAV obstacle avoidance model is optimized to make it suitable for multi-UAV obstacle avoidance tasks, multi-UAV inter-aircraft obstacle avoidance is added to the obstacle avoidance model, the distance-based inter-aircraft obstacle avoidance method is proposed to improve the distance-based inter-aircraft obstacle avoidance method and the range-based inter-aircraft obstacle avoidance method is proposed to address the shortcomings of the distance-based inter-aircraft obstacle avoidance method. The inter aircraft obstacle avoidance method with improved distance and the inter-aircraft obstacle avoidance method with range are trained by multi-intelligence reinforcement learning and compared to each other. The simulation results of Gazebo show that the algorithm framework proposed in this paper can effectively accomplish the task of multi-UAV cooperative obstacle avoidance.

参考文献:
[1]丁肖倩. 多无人机协同避障规划系统研究[D]. 黑龙江:哈尔滨工业大学,2022.
[2]邱华鑫,段海滨. 从鸟群群集飞行到无人机自主集群编队[J]. 工程科学学报,2017,39(3):317-322.
[3]张宏宏,甘旭升,毛亿,等. 无人机避障算法综述[J]. 航空兵器,2021,28(5):53-63.
[4]查勇. 无人机编队避障与控制技术研究现状及发展趋势[J]. 中国设备工程,2021(6):188-189.
[5]陈守凤. 基于改进人工势场法的多无人机协同航迹规划算法研究[D]. 黑龙江:哈尔滨工业大学,2017.
[6]Fang B , Feng X F , Xu S . Research on Cooperative Collision Avoidance Problem of Multiple UAV Based on Reinforcement Learning[C].International Conference on Intelligent Computation Technology & Automation. 2017.
[7]杜卫康. 基于GWO-PSO融合算法的USV群体对抗动态避障与协同围捕[D]. 吉林:吉林大学,2022.
[8]Wu Y , Gou J , Hu X , et al. A new consensus theory-based method for formation control and obstacle avoidance of UAVs[J]. Aerospace Science and Technology, 2020, 107.
[9]Fu X , Zhang J , Chen J , et al. Formation Flying and Obstacle Avoidance Control of UAV Cluster Based on Backbone Network[C]// 2020 IEEE 16th International Conference on Control & Automation (ICCA). IEEE, 2020.
[10]李田凤. 多无人机编队在避障下的协同控制方法研究[D]. 四川:电子科技大学,2021.
[11]Sun Y , Hu X , Xiao J , et al. Multi-Agent Cluster Systems Formation Control with Obstacle Avoidance[C]// 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2020.
[12]赵鹏. 基于群集行为的无人机自主编队算法研究[D]. 黑龙江:哈尔滨工业大学,2018.
[13]贾振. 基于多区域分级式的无人机编队控制算法研究[D]. 江苏:南京邮电大学,2019.
[14]Wang J , Ahn I S , Lu Y , et al. A distributed estimation algorithm for collective behaviors in multiagent systems with applications to unicycle agents[J]. International Journal of Control Automation and Systems, 2017, 15(6):2829-2839.
[15]Hu C , Ning B , Xu M , et al. An Experience Aggregative Reinforcement Learning With Multi-Attribute Decision-Making for Obstacle Avoidance of Wheeled Mobile Robot[J]. IEEE Access, 8:108179-108190.
[16]Lin J S , Chiu H T , Gau R H . Decentralized Planning-Assisted Deep Reinforcement Learning for Collision and Obstacle Avoidance in UAV Networks[C]// 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring). IEEE, 2021.
[17]Wang G , Xu Y , Liu Z , et al. Integrating human experience in deep reinforcement learning for multi-UAV collision detection and avoidance[J]. Industrial Robot, 2022(2):49.
[18]Huang H , Zhu G , Fan Z , et al. Vision-based Distributed Multi-UAV Collision Avoidance via Deep Reinforcement Learning for Navigation[J]. arXiv e-prints, 2022.
[19]李琳,李双霖,高佩忻. 基于DDPG的无人机路径规划[J]. 兵器装备工程学报,2022,43(2):176-180.
[20]路朝阳,王奉冲,周君,等. 基于激光雷达的多旋翼无人机室内定位与避障研究[J]. 电子制作,2018(16):34-35.
[21]张仕充. 基于强化学习的多无人机路径规划[D]. 四川:四川大学,2021.
[22]李钧泽,孙咏,焦艳菲,等. 基于改进人工势场的AGV路径规划算法[J]. 计算机系统应用,2022,31(3):269-274.
[23]杜威,丁世飞. 多智能体强化学习综述[J]. 计算机科学,2019,46(8):1-8.
[24]张通. 基于深度强化学习的第一人称射击游戏研究[D]. 四川:电子科技大学,2020.
[25]刘思嘉. 基于强化学习的城市交通路径规划问题研究[D]. 山东:烟台大学,2021.
[26]张泽阳. 基于强化学习的完全信息博弈理论研究与实现[D]. 陕西:西安电子科技大学,2021.
[27]廖风柯. 基于深度强化学习的相机位姿控制策略研究[D]. 华北电力大学,2021.
[28]Hasselt H V , Guez A , Silver D . Deep Reinforcement Learning with Double Q-learning[J]. Computer ence, 2015.
[29]Mnih V , Badia A P , Mirza M , et al. Asynchronous Methods for Deep Reinforcement Learning[J]. 2016.
[30]Rusu A A , Colmenarejo S G , Gulcehre C , et al. Policy Distillation[J]. Computer Science, 2015.
[31]Schulman J , Levine S , Moritz P , et al. Trust Region Policy Optimization[J]. Computer Science, 2015:1889-1897.
[32]Schulman J , Wolski F , Dhariwal P , et al. Proximal Policy Optimization Algorithms[J]. 2017.
[33]Lillicrap T P , Hunt J J , Pritzel A , et al. Continuous control with deep reinforcement learning[J]. Computer ence, 2015.
[34]Castaneda A O.Deep reinforcement learning variants of multi- agent learning algorithms[D].University of Edinburgh.School of Informatics,2016.
[35]Hausknecht M , Stone P . Deep Recurrent Q-Learning for Partially Observable MDPs[J]. Computer Science, 2015.
[36]Karkus P , Hsu D , Lee W S . QMDP-Net: Deep Learning for Planning under Partial Observability[J]. 2017.
[37]于丹宁,倪坤,刘云龙. 基于循环卷积神经网络的POMDP值迭代算法[J]. 计算机工程,2021,47(2):90-94,102.
[38]张婷婷,蓝羽石,宋爱国. 无人集群系统自主协同技术综述[J]. 指挥与控制学报,2021,7(2):127-136.
[39]Rashid T , Samvelyan M , Witt C D , et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[J]. 2018.
[40]赵伟伟. 无人机集群编队及其避障控制关键技术研究[D]. 中国科学院大学,2020.
中图分类号:

 V24    

馆藏号:

 59108    

开放日期:

 2023-12-30    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式