查看论文信息

中文题名：	基于卷积神经网络的图像语义分割方法研究与设计
姓名：	曾昆
学号：	17031211582
保密级别：	公开
论文语种：	chi
学科代码：	081203
学科名称：	工学 - 计算机科学与技术（可授工学、理学学位） - 计算机应用技术
学生类型：	硕士
学位：	工学硕士
学校：	西安电子科技大学
院系：	计算机科学与技术学院
专业：	计算机科学与技术
研究方向：	图像处理
第一导师姓名：	裘雪红
第一导师单位：	西安电子科技大学
完成日期：	2020-03-30
答辩日期：	2020-05-23
外文题名：	Research and Design of Image Semantic Segmentation Based on Convolutional Neural Network
中文关键词：	图像语义分割 ; 深度卷积神经网络 ; 空洞卷积 ; 条件随机场
外文关键词：	image semantic segmentation ; deep convolutional neural network ; dilated convolution ; conditional random field
中文摘要：	︿作为构建人工智能视觉系统的重要部分，计算机视觉领域一直广受研究学者的关注。而图像语义分割作为计算机视觉领域中关于图像理解的关键性技术，通过给图像中每个像素分配一个语义类别标签来分割图像，其研究成果被应用于诸多领域，包括：汽车自动驾驶中环境场景分割、医学检测中器官图像分割和国防安全中遥感影像分割等。近几年来，深度学习大规模的应用使得深度卷积神经网络(Deep Convolutional Neural Network，DCNN)在计算机视觉领域中取得了巨大的成功，然而，现有基于卷积神经网络的图像语义分割方法仍存在丢失细节信息以及上下文信息的问题。本文基于现有的卷积神经网络图像语义分割方法的研究与分析，并且充分考虑了空洞卷积在提高网络感受野上做出的贡献，提出了一种基于编解码结构和条件随机场后处理方式的改进图像语义分割方法。首先，编码器利用密集空洞空间金字塔池化结构提取在更大感受野上的多尺度上下文特征信息用以捕获高级语义信息。然后，解码器利用跳跃连接融合低级像素细节信息与高级语义信息，并对得到的信息使用密集上采样卷积精确恢复分割边界。最后，利用全连接条件随机场作为后处理算法进一步提升图像的边缘分割精度，使得整体图像的分割效果得到加强。为了验证本文设计的图像语义分割改进方法的有效性，将本文提出的改进算法与当前先进的图像语义分割算法模型在基准数据集PASCAL VOC 2012上进行实验对比。实验数据表明，相比其他三种分割方法，本文提出的方法在三个公认的图像语义分割评价指标上均取得了最好的成绩，在像素识别准确率和物体边界分割精确度上都有所提升。实验结果有力的证明了本文提出的改进方法不仅能够捕获图像的多尺度高级语义信息，并且能够有效的利用图像低级细节信息，同时对特殊场景的图像分割具有较强的鲁棒性。实验证明，本文设计的基于编解码结构和条件随机场的分割方法有效的提高了图像分割的准确率。﹀
外文摘要：	︿ As an important part of the construction of artificial intelligence vision systems, the field of computer vision has been widely concerned by researchers. Image semantic segmentation, as a key technology for image understanding in the field of computer vision, divides the image by assigning a semantic category label to each pixel in the image. Its research results have been applied to many fields, including in automotive autonomous driving. Environmental scene segmentation, organ image segmentation in medical detection, and remote sensing image segmentation in national security. In recent years, the large-scale application of deep learning has made the Deep Convolutional Neural Network (DCNN) a great success in the field of computer vision. However, image semantic segmentation methods based on convolutional neural networks still exist problems with missing details and context information. Based on the research and analysis of existing convolutional neural network image semantic segmentation methods, and considering the contribution of dilated convolution in improving the receptive field of the network. Based on the encoder-decoder structure and conditional random field post-processing, this paper proposes an improved image Semantic segmentation method. First, the encoder uses dense hollow space pyramid pooling to obtain multi-scale context feature information on a larger receptive field to capture high-level semantic information. Then, the decoder uses jump connections to fuse low-level pixel detail information and high-level semantic information, and uses dense up-sampling convolution to accurately recover the segmentation boundary for the obtained information. Finally, the fully connected conditional random field is used as a post-processing algorithm to further improve the edge segmentation accuracy of the image, so that the overall image segmentation effect is enhanced. In order to verify the effectiveness of the improved image semantic segmentation method designed in this paper, the improved algorithm proposed in this paper is compared with the current most advanced image semantic segmentation algorithm model on the benchmark dataset PASCAL VOC 2012. Experimental data shows that compared with the other three segmentation methods, the proposed method has achieved the best results on three recognized evaluation indexes of image semantic segmentation, and has some advantages in pixel recognition accuracy and object boundary segmentation accuracy promotion. The experimental results strongly prove that the improved method proposed in this paper can not only capture the multi-scale high-level semantic information of the image, but also effectively use the low-level detailed information of the image. The segmentation method based on encoder-decoder structure and conditional random field designed in this paper effectively improves the accuracy of image segmentation. ﹀
参考文献：	︿ [1] Garcia-Garcia A, Orts-Escolano S, Oprea S, et al. A review on deep learning techniques applied to semantic segmentation[J]. arXiv preprint arXiv:1704.06857, 2017. [2] 魏云超赵耀. 基于DCNN的图像语义分割综述[J]. 北京交通大学学报:自然科学版, 2016(40):91. [3] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. science, 2006, 313(5786): 504-507. [4] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database[C]//2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009: 248-255. [5] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105. [6] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440. [7] Huang C, Davis L S, Townshend J R G. An assessment of support vector machines for land cover classification[J]. International Journal of remote sensing, 2002, 23(4): 725-749. [8] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 3354-3361. [9] Patil D D, Deore S G. Medical image segmentation: a review[J]. International Journal of Computer Science and Mobile Computing, 2013, 2(1): 22-27. [10] Peng C, Zhang X, Yu G, et al. Large kernel matters—improve semantic segmentation by global convolutional network[C]//Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017: 1743-1751. [11] Shi J, Malik J. Normalized cuts and image segmentation[J]. IEEE Trans.pattern Anal.mach.intell, 2000, 22(8):888-905. [12] Rother C, Kolmogorov V, Blake A. Grabcut: Interactive foreground extraction using iterated graph cuts[C]//ACM transactions on graphics (TOG). ACM, 2004, 23(3): 309-314. [13] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. arXiv preprint arXiv:1511.00561, 2015. [14] Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1529-1537. [15] Li H, Xiong P, Fan H, et al. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation[J]. arXiv: Computer Vision and Pattern Recognition, 2019. [16] Zhou Z, Siddiquee M R, Tajbakhsh N, et al. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation[J]. arXiv: Image and Video Processing, 2019. [17] Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions[J]. arXiv: Computer Vision and Pattern Recognition, 2015. [18] Lin G, Milan A, Shen C, et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation[C]. computer vision and pattern recognition, 2017: 5168-5177. [19] Chen L C, Papandreou G, Kokkinos I, et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J]. Computer Science, 2014(4):357-361. [20] Koller D, Friedman N. Probabilistic graphical models: principles and techniques[M]. MIT press, 2009. [21] Chen L C , Papandreou G , Kokkinos I , et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 40(4):834-848. [22] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017. [23] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017: 2881-2890. [24] Peng C, Zhang X, Yu G, et al. Large kernel matters—improve semantic segmentation by global convolutional network[C]//Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017: 1743-1751. [25] Wei Y , Liang X , Chen Y , et al. STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 39(11):2314-2320. [26] Liu Y, Jiang P T, Petrosyan V, et al. DEL: Deep Embedding Learning for Efficient Image Segmentation[C]//IJCAI. 2018, 864: 870. [27] Cheng M M, Liu Y, Hou Q, et al. HFS: Hierarchical feature selection for efficient image segmentation[C]//European Conference on Computer Vision. Springer, Cham, 2016: 867-882. [28] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014. [29] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587. [30] Sahoo P K, Soltani S, Wong A K C. A survey of thresholding techniques[J]. Computer vision, graphics, and image processing, 1988, 41(2): 233-260. [31] Yang Z, Chung F L, Shitong W. Robust fuzzy clustering-based image segmentation[J]. Applied soft computing, 2009, 9(1): 80-84. [32] Kanungo T, Mount D M, Netanyahu N S, et al. An efficient k-means clustering algorithm: Analysis and implementation[J]. IEEE transactions on pattern analysis and machine intelligence, 2002, 24(7): 881-892. [33] Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis[J]. IEEE Transactions on pattern analysis and machine intelligence, 2002, 24(5): 603-619. [34] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [35] Hecht-Nielsen R. Theory of the backpropagation neural network[M]//Neural networks for perception. Academic Press, 1992: 65-93. [36] Sermanet P, Eigen D, Zhang X, et al. Overfeat: Integrated recognition, localization and detection using convolutional networks[J]. arXiv preprint arXiv:1312.6229, 2013. [37] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9. [38] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [39] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 801-818. [40] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258. [41] Gribbon K T, Bailey D G. A novel approach to real-time bilinear interpolation[C]//Proceedings. DELTA 2004. Second IEEE International Workshop on Electronic Design, Test and Applications. IEEE, 2004: 126-131. [42] Zeiler M D, Taylor G W, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning[C]//2011 International Conference on Computer Vision. IEEE, 2011: 2018-2025. [43] Pascanu R, Gulcehre C, Cho K, et al. How to construct deep recurrent neural networks[J]. arXiv preprint arXiv:1312.6026, 2013. [44] Everingham M, Eslami S M A, Van Gool L, et al. The pascal visual object classes challenge: A retrospective[J]. International journal of computer vision, 2015, 111(1): 98-136. [45] Niwattanakul S, Singthongchai J, Naenudorn E, et al. Using of Jaccard coefficient for keywords similarity[C]//Proceedings of the international multiconference of engineers and computer scientists. 2013, 1(6): 380-384. ﹀
中图分类号：	TP3
馆藏号：	45291
开放日期：	2020-12-19

附件下载