- 无标题文档
查看论文信息

中文题名:

 面向CNN可靠性分析的FPGA单粒子翻转检测与恢复    

姓名:

 武祥兵    

学号:

 20131213336    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085504    

学科名称:

 工学 - 机械 - 航天工程    

学生类型:

 硕士    

学位:

 工程硕士    

学校:

 西安电子科技大学    

院系:

 空间科学与技术学院    

专业:

 机械(专业学位)    

研究方向:

 FPGA,可靠性    

第一导师姓名:

 闫允一    

第一导师单位:

 西安电子科技大学    

第二导师姓名:

 高翔    

完成日期:

 2023-05-10    

答辩日期:

 2023-05-23    

外文题名:

 FPGA Single Event Flip Detection for CNN Reliability Analysis Measurement and Recovery    

中文关键词:

 FPGA ; 单粒子翻转 ; 关键信号 ; 卷积神经网络    

外文关键词:

 FPGA ; SEU ; Key Signal ; CNN    

中文摘要:

SRAM型FPGA的并行流水线、低功耗设计适用于卷积神经网络(Convolutional Neural Network,CNN)的加速,成为智能空间装备的深度学习领域重要支撑技术。但相比其他类型的硬件加速器,SRAM型FPGA对单粒子翻转(Single Event Upset,SEU)敏感性更高,面向卷积神经网络应用,研究FPGA单粒子翻转的检测与恢复方法,对促进深度学习类网络结构在空间辐射环境中高可靠应用有着重要意义。

现阶段,研究者大多使用回读的方式对FPGA中SEU现象进行检测,该方法是将回读的配置数据与黄金数据进行对比。而本文以SEU引起的电路信号异常为基础,提出一种基于关键信号的FPGA电路SEU检测和恢复方法,该方法可对CNN电路进行SEU检测。本文主要工作内容如下:

(1)为验证SEU现象会引起FPGA电路信号的异常变化,设计关键信号测试系统,对实际电路进行随机故障注入,观测电路中各个信号的数据变化。实验结果表明,FPGA中的SEU现象能够对电路信号数值变化产生影响,验证了关键信号测试系统的正确性和后文方法的可行性。

(2)基于FPGA电路模块内部逻辑关系、模块间连接关系和信号间数据传递关系,提出邻接信号比、最短路径比和信号连接比三个评价指标合成表征电路信号重要性的值,并据此选出关键信号。模拟故障注入实验结果验证了关键信号的有效性。

(3)提出基于关键信号的电路布设规则、异常检测规则和分析方法,使用现有方法恢复电路异常;评估关键信号在电路中布设的代价;为减少关键信号数据在传输过程中出现异常,以冗余思想设计关键信号防错结构。对关键信号分析方法和恢复策略进行验证,结果表明关键信号的分析和恢复策略是准确的;实验结果还表明,FPGA电路中SEU现象会引起电路功能异常、数据异常和故障掩盖现象。

(4)本课题选用LetNet-5作为卷积神经网络的代表结构,通过FPGA实现后,完成关键信号筛选、布设,并通过模拟故障注入,记录关键信号数据的变化,分析卷积神经网络电路中出现的故障类型和故障原因,具体分析LetNet-5电路中对SEU敏感的结构,对LetNet-5网络硬件电路提出了设计建议。实验结果表明,LetNet-5电路出现的异常多来自于数据路径中的异常,其中卷积层出现的异常次数较多。结果还表明,卷积层为LetNet-5电路中对SEU最为敏感的结构。

外文摘要:

The parallel pipeline and low-power design of SRAM-based FPGAs is suitable for the acceleration of Convolutional Neural Network(CNN), which has become an important supporting technology in the field of deep learning of intelligent space equipment. However, compared with other types of hardware accelerators, SRAM-based FPGAs are more sensitive to Single Event Upsets(SEU), so the detection and recovery methods of FPGA single event upsets are studied for convolutional neural network applications. It is of great significance to promote the highly reliable application of deep learning network structure in space radiation environment.

 

At the current research stage, most researchers tend to use the readback method to detect the SEU phenomenon in FPGA, which is to compare the readback configuration data with the gold data. In this thesis, based on the circuit signal anomaly caused by SEU, proposes a new method for the detection and recovery of FPGA circuit SEU based on a key signal, which can be used to detect CNN circuit SEU. The research works of the thesis are as follows:

 

(1)In order to verify whether the SEU phenomenon will cause the abnormal changes of FPGA circuit signal, a key signal teste system is designed to inject random faults into the actual circuit to observe the data changes of each signal in the circuit. The experimental results show that the SEU phenomenon in FPGA will affect the circuit signal. In addition,the experiment verifies the correctness of the key signal test system and the feasibility of the following research method.

 

 (2) Based on the internal logic relationship of FPGA circuit modules, the connection relationship between modules and the data transmission relationship between signal, three evaluation indexes of adjacent signal ratio , shortest path ratio  and signal connection ratio  are proposed to characterize the importance of the  value circuit signal, and the key signal are selected accordingly. The simulation results of fault injection verify the effectiveness of the key signal.

 

(3) The circuit layout rules, anomaly detection rules and analysis methods based on key signal , using existing methods to recover circuit anomalies in the thesis; The cost of the key signal layout in the circuit is evaluated, and design of error proofing structure for key signal based on redundancy to reduce the abnormality of the key signal data in the transmission process. The key signal analysis method and recovery strategy are verified, and the results show that the key signal analysis and recovery strategy are accurate; It also show that the SEU phenomenon in FPGA circuit will cause abnormal circuit function, abnormal data and fault concealment.

 

(4) In this thesis, LetNet-5 is selected as the representative structure of convolutional neural network, after FPGA implementation, the key signal screening and layout are completed, and the changes of key signal data are recorded by simulating fault injection, the fault types and causes in convolutional neural network circuit are analyzed, and the structure sensitive to SEU in LetNet-5 circuit is analyzed in detail. The design suggestion of LetNet-5 network hardware circuit is put forward. The research results turn out that the anomalies of LetNet-5 circuit mostly come from the anomalies in the data path, and the number of anomalies in the convolution layer is more. It also shows that the convolution kernel calculation module is the most sensitive structure to SEU in the LetNet-5 circuit.

参考文献:
[1]Rezzak N, Chipana R, Lao C, et al. In Orbit Programming and SEE characterization of the Microchip RT PolarFire® FPGA Fabric[C]. 2021 21th European Conference on Radiation and Its Effects on Components and Systems (RADECS). IEEE, 2021: 1-6.
[2]侯建文,张爱兵,郑香脂,余庆龙.FPGA单粒子翻转事件在轨探测研究[J].宇航学报,2014,35(04):454-458.
[3]He W, Wang Y, Xing K, et al. SEU readback interval strategy of SRAM-based FPGA for space application[C]. 2011 IEEE International Conference on Computer Science and Automation Engineering. IEEE, 2011, 4: 238-241.
[4]Keller A. Using on-chip error detection to estimate FPGA design sensitivity to configuration upsets, Master Thesis[D]. Provo: Brigham Young University, 2017.
[5]Xilinx WP402(v1.0.1). Considerations Surrounding Single Event Effects in FPGAs, ASICs, and Processors[S]. Xilinx White Paper,2012
[6]Xilinx UG116(v10.16). Device Reliability Report Second Half 2021[S]. Xilinx User Gudies,2022
[7]韦欣荣,王金华,王颖等.XQR2V3000 FPGA单粒子翻转率在轨探测研究[J].宇航学报,2019,40(06):719-724.
[8]Quinn H, Graham P, Morgan K, et al. Flight experience of the Xilinx Virtex-4[J]. Transactions on Nuclear Science, 2013, 60: 2682-2690.
[9]He W, Li J, Zhou Y, et al. Study on an Estimation Method of On-orbit Single Event Upset Rate Based on Historical Data[C]. 2021 IEEE 4th International Conference on Electronics Technology (ICET). IEEE, 2021: 1342-1346.
[10]Sharma J, Rao N, Mohamed O A. Fault Injection Controller Based Framework to Characterize Multiple Bit Upsets for FPGA Designs[C]. 2020 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA). IEEE, 2020: 1-5.
[11]Alderighi M, Casini F, D'Angelo S, et al. Evaluation of single event upset mitigation schemes for SRAM based FPGAs using the FLIPPER fault injection platform[C]. 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007). IEEE, 2007: 105-113.
[12]张皓,裴玉奎.SRAM型FPGA中SEM IP核的验证与自动注错方法[J].半导体技术,2017,42(03):223-228+240.DOI:10.13290/j.cnki.bdtjs.2017.03.013.
[13]施聿哲, 陈鑫, 陈凯,等. 基于FPGA硬件的单粒子翻转模拟技术[J]. 数据采集与处理, 2021, 36(4): 822-830.
[14]Ullah A, Reviriego P, Maestro J A. An efficient methodology for on-chip SEU injection in flip-flops for Xilinx FPGAs[J]. IEEE Transactions on Nuclear Science, 2018, 65(4): 989-996.
[15]王忠明. SRAM型FPGA的单粒子效应评估技术研究[D].清华大学,2011.
[16]Kumawat Y S, Arora R, Mehta S D. In Orbit Single Event Upset Detection and Configuration Memory Scrubbing of Virtex-5QV FPGA[C]. 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN). IEEE, 2021: 429-433.
[17]龙云璐. 基于故障传播模型的FPGA硬件脆弱性分析方法[D].电子科技大学,2018.
[18]钟敏. SRAM型FPGA的SEU容错技术研究[D].中国科学院大学(中国科学院光电技术研究所),2021.DOI:10.27543/d.cnki.gkgdk.2021.000020.
[19]Liu M, Yang H, Tanachutiwat S, et al. FPGA based on integration of carbon nanorelays and CMOS devices[C]. 2009 IEEE/ACM International Symposium on Nanoscale Architectures. IEEE, 2009: 61-64.
[20]Anghel L, Alexandrescu D, Nicolaidis M. Evaluation of a soft error tolerance technique based on time and/or space redundancy[C]. Proceedings of the 13th symposium on Integrated circuits and systems design, 2000.
[21]Herrera A I, López V M. Self-reference scrubber for TMR systems based on Xilinx Virtex FPGAs, Integrated Circuit and System Design[C]. Power and Timing Modeling, Optimization, and simulation,Springer Berlin Heidelberg: 2011, 6951: 133-142.
[22]Lee D S, King M, Evans W, et al. Single-event characterization of 16 nm FinFET Xilinx Ultrascale+ devices with heavy ion and neutron irradiation[C]. 2018 IEEE Radiation Effects Data Workshop (REDW). IEEE, 2018: 1-8.
[23]Kranitis N, Tsigkanos A, Theodorou G, et al. A single chip dependable and adaptable payload data processing unit[C]. 2015 IEEE 21st International On-Line Testing Symposium (IOLTS). IEEE, 2015: 138-143.
[24]Villa P, Bezerra E, Goerl R, et al. Analysis of COTS FPGA SEU-sensitivity to combined effects of conducted-EMI and TID[C]. 2017 11th International Workshop on the Electromagnetic Compatibility of Integrated Circuits (EMCCompo). IEEE, 2017: 27-32.
[25]Zhang X, Rabah H, Weber S. Dynamic slowdown and partial reconfiguration to optimize energy in FPGA based auto-adaptive SoPC[C]. 4th IEEE International Symposium on Electronic Design, Test and Applications (delta 2008). IEEE, 2008: 153-157.
[26]Ullah A, Reviriego P, Maestro J A. An efficient methodology for on-chip SEU injection in flip-flops for Xilinx FPGAs[J]. IEEE Transactions on Nuclear Science, 2018, 65(4): 989-996.
[27]Trabelsi C, Meftali S, Dekeyser J L. Distributed control for reconfigurable FPGA systems: A high-level design approach[C]. 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC). IEEE, 2012: 1-8.
[28]Cetin E, Diessel O, Gong L, et al. Reconfiguration network design for SEU recovery in FPGAs[C]. 2014 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2014: 1524-1527.
[29]Einsfeldt A, Giacomini R. Fault-tolerant architecture with full recovery under presence of SEU[C]. 2018 IEEE 19th Latin-American Test Symposium (LATS). IEEE, 2018: 1-4.
[30]Agiakatsikas D, Nguyen N T H, Zhao Z, et al. Reconfiguration control networks for TMR systems with module-based recovery[C]. 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2016: 88-91
[31]Zhang J, Han T, Li Y, et al. Real-time redundant scrubbing (RRS) system for radiation protection on SRAM-based FPGA[C]. 2020 5th International Conference on Computer and Communication Systems (ICCCS). IEEE, 2020: 905-911.
[32]Benevenuti F, Chielle E, Tonfat J, et al. Experimental applications on SRAM-based FPGA for the NanosatC-BR2 scientific mission[C]. 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2019: 140-146.
[33]折夏煜,刘玉宏,王杨圣等.数据集与网络结构对基于FPGA的CNN加速器的抗软错误性能的影响[J/OL].小型微型计算机系统:1-7[2023-03-23].http://kns.cnki.net/kcms/detail/21.1106.TP.20220808.0836.002.html.
[34]Luza L M, Ruospo A, Söderström D, et al. Emulating the effects of radiation-induced soft-errors for the reliability assessment of neural networks[J]. IEEE Transactions on Emerging Topics in Computing, 2021, 10(4): 1867-1882.
[35]Maillard P, Chen Y P, Vidmar J, et al. Radiation Tolerant Deep Learning Processor Unit (DPU) based platform using Xilinx 20nm Kintex UltraScale™ FPGA[J]. IEEE Transactions on Nuclear Science, 2022.
[36]Hosseinkhani A, Ghavami B. Improving Soft Error Reliability of FPGA-based Deep Neural Networks with Reduced Approximate TMR[C]. 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE). IEEE, 2021: 459-464.
[37]Chen K, Chen X, Zhang Y, et al. A Rapid Evaluation Technology for SEU in Convolutional Neural Network Circuits[C]. 2021 IEEE 3rd International Conference on Circuits and Systems (ICCS). IEEE, 2021: 19-23.
[38]Lopes I C, Kastensmidt F L, Susin A A. SEU susceptibility analysis of a feedforward neural network implemented in a SRAM-based FPGA[C]. 2017 18th IEEE Latin American Test Symposium (LATS). IEEE, 2017: 1-6.
[39]颜通. 基于SRAM-FPGA的星上关键处理模块可靠性评估与加固[D].天津大学,2020.DOI:10.27356/d.cnki.gtjdu.2020.004341.
[40]周荔丹,闫朝鑫,姚钢等.空间辐射环境对航天器分布式电力系统关键部件的影响及应对策略[J].电工技术学报,2022,37(06):1365-1380.DOI:10.19595/j.cnki.1000-6753.tces.201634.
[41]王保坤. 数字电路的可靠性分析与容错设计研究[D].南京理工大学,2018.
[42]余庆丰. 基于FPGA的天基网络处理通用加固机制研究[D].国防科学技术大学,2013.
[43]Lilin Q, Muqing W, Min Z. Identification of key nodes in complex networks[C]. 2021 7th International Conference on Computer and Communications (ICCC). IEEE, 2021: 2230-2234.
[44]桑林海. 基于VHDL的FPGA工程模块划分和关系研究[D].西安电子科技大学,2014.
[45]查全超. 基于复杂网络的Opencore应用网络特性分析[D].西安电子科技大学,2014.
[46]Lu J, Chen G. A brief overview of some recent advances in complex dynamical networks control and synchronization[C]. 2008 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2008: 2518-2521.
[47]Lilin Q, Muqing W, Min Z. Identification of key nodes in complex networks[C]. 2021 7th International Conference on Computer and Communications (ICCC). IEEE, 2021: 2230-2234.
[48]Xilinx PG036.Soft Error Mitigation Controller v4.1[S]. LogicCORE IP Product Guide,2022
[49]韩瑞师. 基于FPGA部分重配置技术的SEU故障注入平台设计和实现[D].天津大学,2019.DOI:10.27356/d.cnki.gtjdu.2019.000860.
[50]兰风宇. Xilinx Virtex-7 FPGA软错误减缓技术研究[D]. 黑龙江: 哈尔滨工业大学, 2016.
[51]温国栋. 抗辐射1553B总线接口的设计与实现[D].国防科学技术大学,2013.
[52]王鹏. 一种抗辐射加固CAN总线收发器的设计与实现[D].电子科技大学,2019.DOI:10.27005/d.cnki.gdzku.2019.000348.
[53]陈雷,张瑶伟,王硕等.FPGA三模冗余工具的关键技术与发展[J].电子与信息学报,2022,44(06):2230-2244.
[54]Ebrahim A, Benkrid K, Iturbe X, et al. Multiple-clone configuration of relocatable partial bitstreams in Xilinx Virtex FPGAs[C]. 2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013). IEEE, 2013: 178-183.
[55]Zhang R, Xiao L, Li J, et al. An adjustable and fast error repair scrubbing method based on Xilinx essential bits technology for SRAM-Based FPGA[J]. IEEE Transactions on Reliability, 2019, 69(2): 430-439.
[56]黄圳. 深度学习算法的FPGA硬件加速研究与实现[D].电子科技大学,2019.
[57]钱欢. 基于CNN芯片的抗辐照干扰研究及SNN芯片损伤分析[D].中国科学院大学(中国科学院微小卫星创新研究院),2020.DOI:10.44194/d.cnki.gwxwx.2020.000010.
[58]Lin X, Mo Y, Su T. Failure Characteristics of FPGA-Based Convolutional Neural Networks under RF Interference[C]. 2021 IEEE 4th International Conference on Electronics Technology (ICET). IEEE, 2021: 364-368.
[59]姜宏旭,刘亭杉,李辉勇,张萍,段洣毅.FPGA+DSP异构视频处理系统中基于SRIO的数据高效传输方法[J].计算机学报,2015,38(06):1119-1130.
[60]黄圳. 深度学习算法的FPGA硬件加速研究与实现[D].电子科技大学,2019.
中图分类号:

 V44    

馆藏号:

 56991    

开放日期:

 2023-12-12    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式