- 无标题文档
查看论文信息

中文题名:

 知识驱动的强化学习及其在交通拥堵控制中的应用研究    

姓名:

 陈越    

学号:

 19011210100    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081001    

学科名称:

 工学 - 信息与通信工程 - 通信与信息系统    

学生类型:

 硕士    

学位:

 工学硕士    

学校:

 西安电子科技大学    

院系:

 通信工程学院    

专业:

 信息与通信工程    

研究方向:

 强化学习、智能交通系统    

第一导师姓名:

 李长乐    

第一导师单位:

  西安电子科技大学    

完成日期:

 2022-04-15    

答辩日期:

 2022-05-28    

外文题名:

 Knowledge Assisted Reinforcement Learning and Its Application in Traffic Congestion     

中文关键词:

 知识嵌入 ; 强化学习 ; 交通拥堵控制 ; 交通信号控制    

外文关键词:

  knowledge embedding ; reinforcement learning ; congestion control ; traffic signal control    

中文摘要:

随着社会经济的不断发展,城市居民的出行需求不断增加,与此同时汽车保有量也持续增长。截止2021年底,我国机动车保有量达3.93亿辆,汽车突破3亿辆,驾驶人达4.79亿人;每年新登记机动车3000多万辆,新领证驾驶人2000多万人。另一方面,数十个大型城市的汽车保有量占全国汽车保有量超百分之十,其中北京一城汽车保有量超过600万辆。然而,面对如此庞大的汽车数量,城市交通网络基础设施的建设与更新速度无法与之匹配,导致城市交通拥堵愈发严重,大型城市尤为突出。

在此种情况下,交通拥堵控制技术就显得尤为重要。在当前阶段,交通流中车辆以人类驾驶车辆为主,车辆行为主要取决于驾驶员的决策,交通信息不能即时传递到驾驶员处,这导致车辆级别的微观交通拥堵控制手段作用十分有限。于是,当前解决交通拥堵控制的问题还是得从宏观交通流控制的角度出发。而当前交通信号控制设备遍布城市交通网络的每个交叉路口,使交通信号控制成为一种最主要的交通拥堵控制手段之一。交通信号控制是一类重要且具有挑战的真实世界的问题,其主要目标是在交叉路口通过协调各个车辆的移动从而最小化所有车辆的通行时间。传统的交通信号控制基于一些数学假设提出了一些对应的规则对交叉路口进行静态的控制。即便当前能收集到十分丰富的交通数据,具有强大的计算能力和诸多先进的智能交通技术,但交叉路口的交通信号技术还停留在相对原始的阶段,未能有效结合这些技术来提升交通信号控制的性能,导致交通信号控制技术仍存在以下缺点:1)现实世界的交通状态复杂多变,数学模型很难完全描述或充分考虑其中的复杂因素,导致实际的控制与真实情况产生偏差;2)大多交通信号控制停留在孤岛控制阶段上,多个交叉口间缺少有效地协作导致全局控制效果不佳。

基于强化学习的交通信号控制可以动态地对交通流进行控制,从实时收集的交通流数据中对真实交通状态进行学习,避免了基于数学模型的方法在真实环境中可能产生的偏差;多个强化学习智能体之间可以存在信息传递并且可以联合学习,能进行有效地协作。所以,强化学习方法在城市大规模交通信号控制中极具潜力。但是当前强化学习也存在多智能体之间置信分配的问题以及学习过程中数据效率低下的问题。

本研究主要研究适用于交通信号控制的多智能体强化学习算法。针对多智能体之间置信分配问题,本研究在时序差分强化学习框架上创新地引入平均场理论和基于熵正则化的置信分配方法,解决大规模交通信号控制的维度灾难并平衡各个智能体学习过程,使其具有一致性以获得更好的性能。另外,在线强化学习具有数据效率低下的问题,表现为算法需要大量与环境交互获得大量样本轨迹进行训练,时间开销过大,不利于其应用落地以及模型迁移。为解决此问题,本研究创新地引入了一种元学习方法,利用在交通网络中收集的异质交通流数据结合知识嵌入模型对交通拥堵知识进行学习并用以辅助强化学习决策,提升了强化学习的数据效率。本研究提出的方法相较于传统交通信号控制方法在平均通行时延、交叉口车辆平均排队长度等性能指标上具有明显优势;相较于传统多智能体强化学习方法,在收敛速度、样本效率和性能指标上均有明显优势。

外文摘要:

With the continuous development of the social economy, the travel demand of urban residents continues to increase. At the same time, car ownership continues to increase. By the end of 2021, the number of motor vehicles in our country has reached 393 million (the number of automobiles is over 300 million) and the number of drivers has reached 479 million. More than thirty million motor vehicles are newly registered each year and more than 20 million drivers are newly licensed. On the other hand, the car ownership in serval of megalopolis accounts for more than 10% of the national car ownership, of which the car ownership in Beijing exceeds 6 million. However, in the face of such a huge number of cars, the construction and updating speed of urban transportation network infrastructure cannot match it, which results in increasingly serious urban traffic congestion.

 

In this case, traffic congestion control technology is particularly important. At current stage, the vehicles in the traffic flow are mainly human-driven vehicles, which indicates that the vehicle behavior mainly depends on the decision-making of the driver and the traffic information cannot be transmitted to the driver in real time. It makes the performance of vehicle-level micro-traffic congestion control methods is very limited. Therefore, the current solution to the problem of traffic congestion control still has to start from the perspective of macroscopic traffic flow control. The current traffic signal control equipment is located at all over the intersection of the urban traffic network, which makes the traffic signal control become one of the most important traffic congestion control methods. Traffic signal control is an important and challenging real-world problem whose main goal is to minimize the transit time of all vehicles at intersections by coordinating the movement of various vehicles. The traditional traffic signal control proposes some corresponding rules based on some mathematical assumptions to control the intersection statically. Even though we can collect a lot of traffic data, have powerful computing power and many advanced intelligent traffic technologies, the traffic signal technology at intersections is still at a relatively primitive stage and it cannot effectively combine these technologies to improve the performance of traffic signal control. The traffic signal control technology still has the following shortcomings: 1) The traffic state in the real world is complex and changeable and it is difficult for the mathematical model to fully describe or fully consider these complex factors which results in deviations between the actual control and the real situation; 2) most of the traffic signal control schemes stay on the island control stage. The lack of effective cooperation among multiple intersections leads to poor global control effect.

 

The traffic signal control based on reinforcement learning can dynamically control the traffic flow and learn the real traffic state from the traffic flow data collected in real-time. Thus, it can avoid the possible deviation of the mathematical model-based method in the real environment. There can be information transfer and joint learning between learning agents, which can make them effectively cooperate. Therefore, reinforcement learning methods have great potential in large-scale urban traffic signal control. However, the current reinforcement learning also has the problem of credit assignment among multiple agents and the problem of low data efficiency in the learning process.

 

This thesis mainly studies the multi-agent reinforcement learning algorithm which is suitable for traffic signal control. Aiming to solve the problem of credit assignment among multiple agents, this thesis innovatively introduces mean field theory and a credit assignment method based on entropy regularization in the temporal difference reinforcement learning framework to solve the dimensional disaster of large-scale traffic signal control and balance the learning process of each agent which brings agents consistency to reach a better performance. In addition, online reinforcement learning has the problem of low data efficiency because the algorithm needs to interact with the environment to obtain a large number of sample trajectories for training so that the time overhead is too large. It is not conducive to its application and model migration in large-scale traffic signal control. To solve this problem, this thesis innovatively introduces a meta-learning method that combines with a knowledge embedding model to assist the decision-making of reinforcement learning to improve the data efficiency of reinforcement learning. Compared with the traditional traffic signal control method, the method proposed in this thesis has obvious advantages in performance such as average traffic delay, average queuing length of vehicles at intersections, etc. Compared with other multi-agent reinforcement learning baselines, our method has a better performance in convergence speed and data efficiency.

参考文献:
[1] 公安部. 增驾车型不再单设实习期,私家车新车上牌免查验[EB/OL]. https://m.jiemian.com/article/6952557.html, 2021-12-27/2022-03-18.
[2] 徐周芳. 京津冀城市群交通基础设施对区域经济的空间溢出效应研究[D]. 浙江财经大学.
[3] 苏强. 城市功能区的形成机制及规划特征[J]. 城乡建设, 2012(7):2.
[4] 高瑞 龙建成. 基于可变车道优化的交通网络设计问题[J]. 合肥工业大学学报:自然科学版, 2015(38):1450.
[5] Yue W , Li C , Mao G . Urban Traffic Bottleneck Identification Based on Congestion Propagation[C]// 2018 IEEE International Conference on Communications (ICC). IEEE, 2018.
[6] Roess R P, Prassas E S, McShane W R. Traffic engineering[M]. Pearson/Prentice Hall, 2004.
[7] Li Z, Yu H, Zhang G, et al. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2021, 125: 103059.
[8] Buşoniu L, Babuška R, Schutter B D. Multi-agent reinforcement learning: An overview[J]. Innovations in multi-agent systems and applications-1, 2010: 183-221.
[9] Gupta A, Eysenbach B, Finn C, et al. Unsupervised meta-learning for reinforcement learning[J]. arXiv preprint arXiv:1806.04640, 2018.
[10] Oertel R, Wagner P. Delay-time actuated traffic signal control for an isolated intersection[C]//Proceedings 90th Annual Meeting Transportation Research Board (TRB). 2011.
[11] Cho S, Rao R R. Coordinated ramp-metering control using a time-gap based traffic model[C]//2014 IEEE 80th Vehicular Technology Conference (VTC2014-Fall). IEEE, 2014: 1-6.
[12] Chiu S. Adaptive traffic signal control using fuzzy logic[C]//Proceedings of the Intelligent Vehicles92 Symposium. IEEE, 1992: 98-107.
[13] Darmoul S, Elkosantini S, Louati A, et al. Multi-agent immune networks to control interrupted flow at signalized intersections[J]. Transportation Research Part C: Emerging Technologies, 2017, 82: 290-313.
[14] Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3): 279-292.
[15] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013.
[16] Tan T, Chu T, Wang J. Multi-agent bootstrapped deep Q-network for large-scale traffic signal control[C]//2020 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 2020: 358-365.
[17] Wang X, Ke L, Qiao Z, et al. Large-scale traffic signal control using a novel multiagent reinforcement learning[J]. IEEE transactions on cybernetics, 2020, 51(1): 174-187.
[18] Konda V, Tsitsiklis J. Actor-critic algorithms[J]. Advances in neural information processing systems, 1999, 12.
[19] Aslani M, Mesgari M S, Wiering M. Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events[J]. Transportation Research Part C: Emerging Technologies, 2017, 85: 732-752.
[20] Chu T, Wang J, Codecà L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(3): 1086-1095.
[21] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).
[22] Chen Y, Li C, Yue W, et al. Engineering A Large-Scale Traffic Signal Control: A Multi-Agent Reinforcement Learning Approach[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2021: 1-6.
[23] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017: 1126-1135.
[24] Zheng Z, Oh J, Hessel M, et al. What can learned intrinsic rewards capture?[C]//International Conference on Machine Learning. PMLR, 2020: 11436-11446.
[25] Veeriah V, Hessel M, Xu Z, et al. Discovery of useful questions as auxiliary tasks[J]. Advances in Neural Information Processing Systems, 2019, 32.
[26] Li C, Yue W, Mao G, et al. Congestion propagation based bottleneck identification in urban road networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(5): 4827-4841.
[27] Chen Y, Li C, Yue W, et al. Root cause identification for road network congestion using the gradient boosting decision trees[C]//GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020: 01-06.
[28] Abdulhai B A, El-Tantawy S E T. Multi-Agent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC)[J]. 2012.
[29] Wei H, Zheng G, Yao H, et al. Intellilight: A reinforcement learning approach for intelligent traffic light control[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 2496-2505.
[30] Steingrover M, Schouten R, Peelen S, et al. Reinforcement Learning of Traffic Light Controllers Adapting to Traffic Congestion[C]//BNAIC. 2005: 216-223.
[31] Van der Pol E, Oliehoek F A. Coordinated deep reinforcement learners for traffic light control[J]. Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016), 2016.
[32] Brys T, Pham T T, Taylor M E. Distributed learning and multi-objectivity in traffic light control[J]. Connection Science, 2014, 26(1): 65-83.
[33] Varaiya P. Max pressure control of a network of signalized intersections[J]. Transportation Research Part C: Emerging Technologies, 2013, 36: 177-195.
[34] Friedman J H. Greedy function approximation: a gradient boosting machine[J]. Annals of statistics, 2001: 1189-1232.
[35] Friedman J H. Stochastic gradient boosting[J]. Computational statistics & data analysis, 2002, 38(4): 367-378.
[36] De'Ath G. Boosted trees for ecological modeling and prediction[J]. Ecology, 2007, 88(1): 243-251.
[37] Ma X, Ding C, Luan S, et al. Prioritizing influential factors for freeway incident clearance time prediction using the gradient boosting decision trees method[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2303-2310.
[38] Zhang Y, Haghani A. A gradient boosting method to improve travel time prediction[J]. Transportation Research Part C: Emerging Technologies, 2015, 58: 308-324.
[39] Hastie T, Tibshirani R, Friedman J H, et al. The elements of statistical learning: data mining, inference, and prediction[M]. New York: springer, 2009.
[40] Ding C, Wu X, Yu G, et al. A gradient boosting logit model to investigate driver’s stop-or-run behavior at signalized intersections using high-resolution traffic data[J]. Transportation Research Part C: Emerging Technologies, 2016, 72: 225-238.
[41] Schonlau M. Boosted regression (boosting): An introductory tutorial and a Stata plugin[J]. The Stata Journal, 2005, 5(3): 330-354.
[42] Breiman L, Friedman J H, Olshen R A, et al. Classification and regression trees[M]. Routledge, 2017.
[43] Agogino A K, Tumer K. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains[J]. Autonomous Agents and Multi-Agent Systems, 2008, 17(2): 320-338
中图分类号:

 U12    

馆藏号:

 52371    

开放日期:

 2023-03-18    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式