查看论文信息

中文题名：	基于视觉感知特征和深度网络集成的无参考图像质量评价方法
姓名：	任伟
学号：	18021211062
保密级别：	公开
论文语种：	chi
学科代码：	085208
学科名称：	工学 - 工程 - 电子与通信工程
学生类型：	硕士
学位：	工程硕士
学校：	西安电子科技大学
院系：	电子工程学院
专业：	电子与通信工程
研究方向：	电子与通信工程
第一导师姓名：	何立火
第一导师单位：	西安电子科技大学
第二导师姓名：	冯晓峰
完成日期：	2021-08-21
答辩日期：	2021-05-25
外文题名：	No-Reference Image Quality Assessment Methods Based on Visual Perception Feature and Deep Network Integration
中文关键词：	无参考 ; 图像质量评价 ; 特征迁移学习 ; 增强集成学习 ; 视觉Transformer
外文关键词：	No Reference ; Image Quality Assessment ; Feature Transfer Learning ; Enhanced Integrated Learning ; Visual Transformer
中文摘要：	︿随着大数据和人工智能的发展，图像数据呈现爆炸式增长，其承载着丰富的信息。然而在图像获取、存储、传输、处理和显示等过程中都不可避免地会引入失真，导致视觉质量下降和语义信息缺失。因此需要设计高效、准确的图像质量评价方法，优化图像采集和处理系统，获取更高质量的图像。图像质量评价是图像处理、计算机视觉和人工智能领域中的热点研究问题，发挥着重要的基础作用。本文针对自然场景的无参考图像质量评价中场景复杂、真实失真图像难以准确评价、深度图像质量学习模型无法自适应进行特征筛选等问题，构建特征迁移网络、增强集成模型和多网络协同学习策略，设计无参考图像质量评价方法，提升主客观评价一致性，满足实际场景图像的质量评价需求。主要研究内容如下：（1）提出了基于深度特征迁移学习的无参考图像质量评价方法。由于人工合成失真图像和真实场景图像的失真特征差异较大，利用传统迁移学习方法构建的质量评价模型难以训练，且泛化性不理想。因此，通过引入残差分散注意力局部结构对特征重要性进行建模，快速完成真实场景图像和合成失真图像数据的自适应，有效地学习到不同于真实场景图像上的失真模式，从而显著提升图像质量预测精度。同时，采用了一种新型的目标损失函数以增强评价模型的稳定性。在多个图像质量评价数据库上的实验证明，该方法能够有效反映不同失真类型和不同失真程度的质量变化和提升图像质量预测的主客观一致性。（2）提出了基于多尺度集成网络的无参考图像质量评价方法。针对现有单一网络预测模型难以对真实失真图像的质量进行准确评价的问题，构建多层次残差质量评价网络，并根据各个网络的质量预测结果的主客观一致性，对其进行加权集成预测图像质量。通过在单个残差单元内嵌入具有层级机制的残差链路连接结构，在更细粒度的层次上表示多尺度失真图像特征，利用卷积结构的分层操作来改进模型中的特征融合机制，使得融合后的特征具有更丰富的失真语义信息，实现鲁棒性的无参考图像质量评价。实验结果表明，本方法对真实失真图像的质量评价具有更高的主客观一致性。（3）提出了基于自注意力机制和视觉 Transformer 网络的无参考图像质量评价方法。针对目前以卷积网络为框架的图像质量评价模型中普遍存在的无法自适应性地进行有效特征筛选的问题，采用视觉 Transformer 的长距离依赖建模结构与卷积神经网络的局部特征提取模块，构建具有快速收敛能力的多分支主干网络 ResNet-V2 和自注意力机制视觉 Transformer 模块级联的图像质量评价模型。首先在图像尺度变换的同时利用图像像素插值方法进行像素扩充；然后利用卷积网络提取图像的深层次语义特征，并将语义特征图馈送进 Transformer 中，利用视觉 Transformer 挖掘图像失真的注意力信息；最后利用图像质量预测网络回归得到图像质量分数。图像失真的注意力信息和深度语义特征信息互为补充，融合的更具辨识力特征可以较为全面地反映自然图像的失真和内容特性。实验结果表明，本方法在多个图像质量评价数据库中的预测质量与主观评价结果具有更好的一致性。﹀
外文摘要：	︿ With the development of big data and artificial intelligence, image data shows explosive growth, which carries a wealth of information. However, distortion is inevitably introduced in the process of image acquisition, storage, transmission, processing and display, which leads to the decline of visual quality and the deficiency of semantic information. Consequently, it is necessary to design efficient and accurate image quality assessment (IQA) methods. The evaluation result can be used to optimize image acquisition and processing systems, to obtain higher quality images. IQA is a hot research topic and plays an important fundamental role in the fields of image processing, computer vision and artificial intelligence. Aims at the problems of complex scene, authentic distortion images are difficult to evaluate accurately, and deep learning based image quality assessment model fails to adapt to feature selecting of no-reference image quality assessment (NR-IQA) in natural scenes, this thesis constructs feature migration network, enhancement integration model and multi-network cooperative learning strategy to solve the above mentioned problems. The proposed no-reference image quality assessment methods can improve the consistency of subjective and objective evaluation, and meet the requirements of practical scenarios. The main research contents are as follows: (1) No-reference image quality assessment method based on deep feature transfer learning is proposed. Because of the large difference between the distorted features of synthetic distorted images and actual scenario images, the quality evaluation model based on conventional transfer learning is difficult to train, meanwhile the generalization is not ideal. Therefore, by introducing residual distraction local structure to model the significant sensitive feature, the adaptation of authentic scene image and synthetic distorted image data is completed quickly, and the distortion pattern different from realistic scene image is effectively learned. Thus, the accuracy of image quality prediction is improved significantly. At the same time, a novel objective loss function is used to enhance the stability of the evaluation model. Experiments on multiple image quality assessment databases indicate that this method can effectively reflect the quality changes of different distortion types and different distortion degrees and improve the subjective and objective consistency of image quality prediction. (2) No-reference image quality assessment method based on multi-scale integrated network is proposed. Aiming at the problem that the existing single network prediction model is difficult to accurately evaluate the quality of authentic distorted images, a multi-level residual quality evaluation network is constructed. According to the subjective and objective consistency of each network, the image quality is predicted by weighted integration. By embedding residual link connection structure with hierarchical mechanism in a single residual unit, representing multi-scale distorted image feature at a finer granularity level, the feature fusion mechanism in the model is improved by hierarchical operation of convolution structure. The fused features possess more abundant distortion semantic information and realize robust no-reference image quality assessment. The experimental results indicate that this method has higher subjective and objective consistency in the quality evaluation of authentic distorted images. (3) No-reference image quality assessment method based on self-attention mechanism and Visual Transformer (ViT) network is proposed. To solve the problem that image quality assessment model based on convolution neural network cannot be self-adaptive to carry out feature selecting effectively, an image quality assessment model with multi-branch backbone network ResNet-V2 and self-attention mechanism ViT module is constructed by utilizing the long-distance dependent modeling architecture of ViT and the local feature extraction module of convolution neural network. Firstly, the image pixel interpolation algorithm is used to expand the image data at the same time of image scale transformation. And then the deep semantic features of images are extracted by convolution neural network, and the semantic feature information is fed into the Transformer architecture. The attention information of image distortion is captured by ViT module. Finally, the image quality score can be accessed by quality prediction network. The attention information and deep semantic feature information of image distortion complement each other, the concatenated more discriminative features can reflect the distortion and content characteristics of natural image more comprehensively. The experimental results indicate that the prediction quality of this method on multiple image quality assessment databases is better consistent with the subjective evaluation results. ﹀
参考文献：	︿ [1]Zhai G, Min X. Perceptual image quality assessment: a survey[J]. Science China Information Sciences, 2020, 63: 211301: 1-211301: 52. [2]ITU-R BT.500-13. Methodology for the subjective assessment of the quality of television pictures, 2012. [3]ITU-R BT.710-4. Subjective assessment methods for image quality in high-definition television, 2001. [4]Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [5]Wang Z, Simoncelli E P, Bovik A C. Multiscale structural similarity for image quality assessment[C]//The Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, IEEE, 2003, 2: 1398-1402. [6]Wang Z, Li Q. Information content weighting for perceptual image quality assessment[J]. IEEE Transactions on Image Processing, 2010, 20(5): 1185-1198. [7]Liu A, Lin W, Narwaria M. Image quality assessment based on gradient similarity[J]. IEEE Transactions on Image Processing, 2012, 21(4): 1500-1512. [8]Bosse S, Maniry D, Müller K R, et al. Deep neural networks for no-reference and full-reference image quality assessment[J]. IEEE Transactions on Image Processing, 2017, 27(1): 206-219. [9]Kim J, Lee S. Deep learning of human visual sensitivity in image quality assessment framework[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 1676-1684. [10]Seo S, Ki S, Kim M. A novel just-noticeable-difference-based saliency-channel attention residual network for full-reference image quality predictions[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 1-1. [11]Zhou W, Chen Z. Deep multi-scale features learning for distorted image quality assessment[C]//2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2021: 1-5. [12]Li Q, Wang Z. Reduced-reference image quality assessment using divisive normalization-based image representation[J]. IEEE Journal of Selected Topics in Signal Processing, 2009, 3(2):202-211. [13]Ma L, Li S, Zhang F, et al. Reduced-reference image quality assessment using reorganized DCT-based image representation[J]. IEEE Transactions on Multimedia, 2011, 13(4): 824-829. [14]Carnec M, Le Callet P, Barba D. Visual features for image quality assessment with reduced reference[C]//IEEE International Conference on Image Processing 2005. IEEE, 2005, I-421. [15]Moorthy A K, Bovik A C. A two-step framework for constructing blind image quality indices[J]. IEEE Signal Processing Letters, 2010, 17(5): 513-516. [16]Moorthy A K, Bovik A C. Blind image quality assessment: from natural scene statistics to perceptual quality[J]. IEEE Transactions on Image Processing, 2011, 20(12): 3350-3364. [17]Saad M A, Bovik A C, Charrier C. DCT statistics model-based blind image quality assessment[C]//2011 18th IEEE International Conference on Image Processing. IEEE, 2011: 3093-3096. [18]Saad M A, Bovik A C, Charrier C. Blind image quality assessment: a natural scene statistics approach in the DCT domain[J]. IEEE Transactions on Image Processing, 2012, 21(8): 3339-3352. [19]Mittal A, Moorthy A K, Bovik A C. No-reference image quality assessment in the spatial domain[J]. IEEE Transactions on Image Processing, 2012, 21(12): 4695-4708. [20]Kang L, Ye P, Li Y, et al. Convolutional neural networks for no-reference image quality assessment[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1733-1740. [21]Kang L, Ye P, Li Y, et al. Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks[C]//2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015: 2791-2795. [22]Xu L, Li J, Lin W, et al. Multi-task rank learning for image quality assessment[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 27(9): 1833-1843. [23]Liu X, Van De Weijer J, Bagdanov A D. RankIQA: learning from rankings for no-reference image quality assessment[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1040-1049. [24]Ma K, Liu W, Liu T, et al. dipIQ: blind image quality assessment by learning-to-rank discriminable image pairs[J]. IEEE Transactions on Image Processing, 2017, 26(8): 3951-3964. [25]Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[C]//Twenty-eighth Conference on Neural Information Processing Systems. 2014, 3: 2672-2680. [26]Lin K Y, Wang G. Hallucinated-IQA: no-reference image quality assessment via adversarial learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 732-741. [27]Ren H, Chen D, Wang Y. RAN4IQA: restorative adversarial nets for no-reference image quality assessment[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1). [28]Pan D, Shi P, Hou M, et al. Blind predicting similar quality map for image quality assessment[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6373-6382. [29]Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, Cham, 2015: 234-241. [30]Yan Q, Gong D, Zhang Y. Two-stream convolutional networks for blind image quality assessment[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2200-2211. [31]Jiang Q, Peng Z, Yang S, et al. Authentically distorted image quality assessment by learning from empirical score distributions[J]. IEEE Signal Processing Letters, 2019, 26(12): 1867-1871. [32]Zhang W, Ma K, Yan J, et al. Blind image quality assessment using a deep bilinear convolutional neural network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 30(1): 36-47. [33]Chen D, Wang Y, Gao W. No-reference image quality assessment: an attention driven approach[J]. IEEE Transactions on Image Processing, 2020, 29: 6496-6506. [34]Li F, Zhang Y, Cosman P C. MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 1-1. [35]Ma K, Liu W, Kai Z, et al. End-to-end blind image quality assessment using deep neural networks[J]. IEEE Transactions on Image Processing, 2018, 27:1202-1213. [36]Guan J, Yi S, Zeng X, et al. Visual importance and distortion guided deep image quality assessment framework[J]. IEEE Transactions on Multimedia, 2017, 19(11):2505-2520. [37]Lin H, Hosu V, Saupe D. DeepFL-IQA: Weak supervision for deep IQA feature learning[J]. arXiv preprint arXiv:2001.08113, 2020. [38]Su S, Yan Q, Zhu Y, et al. Blindly assess image quality in the wild guided by a self-adaptive hyper network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020: 3667-3676. [39]Varga D. Multi-pooled inception features for no-reference image quality assessment[J]. Applied Sciences, 2020, 10(6): 2186. [40]Sheikh H R, Wang Z, Cormack L, et al. LIVE Image quality assessment database release 2. [Online]. available: http://live.ece.utexas.edu/research/quality, February 4, 2006. [41]Ponomarenko N, Lukin V, Zelensky A, et al. TID2008-a database for evaluation of full-reference visual quality assessment metrics[J]. Advances of Modern Radioelectronics, 2009, 10(4): 30-45. [42]Ponomarenko N, Jin L, Ieremeiev O, et al. Image database TID2013: peculiarities, results and perspectives[J]. Signal Processing: Image Communication, 2015, 30: 57-77. [43]E. C. Larson and D. M. Chandler. Computational and subjective image quality (CSIQ) database. [Online]. available: http://vision.eng.shizuoka.ac.jp/mod/page/view.php?id=23. [44]Ciancio A, da Silva E A B, Said A, et al. No-reference blur assessment of digital pictures based on multifeature classifiers[J]. IEEE Transactions on Image Processing, 2010, 20(1): 64-75. [45]D Ghadiyaram, Bovik A C. Massive online crowdsourced study of subjective and objective picture quality[J]. IEEE Transactions on Image Processing, 2015, 25(1): 372-387. [46]Ma K, Duanmu Z, Wu Q, et al. Waterloo exploration database: new challenges for image quality assessment models[J]. IEEE Transactions on Image Processing, 2017, 26(99): 1004-1016. [47]Lin H, Hosu V, Saupe D. KonIQ-10k: towards an ecologically valid and large-scale IQA database[J]. IEEE Transactions on Image Processing, 2020, 29: 4041-4056. [48]Lin H, Hosu V, Saupe D. KADID-10k: a large-scale artificially distorted IQA database[C]//2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2019: 1-3. [49]Wu J, Ma J, Liang F, et al. End-to-end blind image quality prediction with cascaded deep neural network[J]. IEEE Transactions on Image Processing, 2020, 29: 7414-7426. [50]Le Callet P, Autrusseau F. Subjective quality assessment IRCCyN/IVC database. [Online]. available: http://ivc.univ-nantes.fr/en/databases/Subjective Database/. [51]Horita Y, Shibata K, Kawayoke Y, et al. MICT image quality evaluation database. [Online]. available: http://mict.eng.u-toyama.ac.jp/mictdb.html. [52]Chandler D M, Hemami S S. VSNR: a wavelet-based visual signal-to-noise ratio for natural images[J]. IEEE Transactions on Image Processing, 2007, 16(9): 2284-2298. [53]Engelke U, Kusuma M, Zepernick H J, et al. Reduced-reference metric design for objective perceptual quality assessment in wireless imaging[J]. Signal Processing: Image Communication, 2009, 24(7): 525-547. [54]Antkowiak J, Jamal Baina T D F, Baroncini F V, et al. Final report from the video quality experts group on the validation of objective models of video quality assessment march 2000[J]. 2000. [55]Fan C, Zhang Y, Feng L, et al. No reference image quality assessment based on multi-expert convolutional neural networks[J]. IEEE Access, 2018, 6: 8934-8943. [56]Li D, Jiang T, Jiang M. Norm-in-norm loss with faster convergence and better performance for image quality assessment[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 789-797. [57]Kingma D, Ba J. Adam: a method for stochastic optimization[C]//Third International Conference on Learning Representations. ICLR, 2015. [58]Zhang L, Zhang L, Mou X, et al. FSIM: a feature similarity index for image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(8): 2378-2386. [59]Zhang L, Shen Y, Li H. VSI: a visual saliency-induced index for perceptual image quality assessment[J]. IEEE Transactions on Image Processing, 2014, 23(10): 4270-4281. [60]Ye P, Kumar J, Kang L, et al. Unsupervised feature learning framework for no-reference image quality assessment[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 1098-1105. [61]Zhang L, Zhang L, Bovik A C. A feature-enriched completely blind image quality evaluator[J]. IEEE Transactions on Image Processing, 2015, 24(8): 2579-2591. [62]Xu J, Ye P, Li Q, et al. Blind image quality assessment based on high order statistics aggregation[J]. IEEE Transactions on Image Processing, 2016, 25(9): 4444-4457. [63]Xue W, Zhang L, Mou X, et al. Gradient magnitude similarity deviation: a highly efficient perceptual image quality index[J]. IEEE Transactions on Image Processing, 2014, 23(2): 684-695. [64]Gao X, Gao F, Tao D, et al. Universal blind image quality assessment metrics via natural scene statistics and multiple kernel learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(12): 2013-2026. [65]Mittal A, Soundararajan R, Bovik A C. Making a “completely blind” image quality analyzer[J]. IEEE Signal Processing Letters, 2013, 20(3): 209-212. [66]Li C, Bovik A C, Wu X. Blind image quality assessment using a general regression neural network[J]. IEEE Transactions on Neural Networks, 2011, 22(5): 793-799. [67]Sheikh H R, Bovik A C. Image information and visual quality[J]. IEEE Transactions on Image Processing, 2006, 15(2): 430-444. [68]Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning[C]// International Conference on Neural Information Processing Systems. MIT Press, 1995. [69]Ahmed N, Asif H M S. Ensembling convolutional neural networks for perceptual image quality assessment[C]//2019 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS). IEEE, 2019: 1-5. [70]Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958. [71]Deepti G, Bovik A C. Perceptual quality prediction on authentically distorted images using a bag of features approach[J]. Journal of Vision, 2017, 17(1): 32-32. [72]Kim J, Zeng H, Ghadiyaram D, et al. Deep convolutional neural models for picture-quality prediction: challenges and solutions to data-driven image quality assessment[J]. IEEE Signal Processing Magazine, 2017, 34(6): 130-141. [73]Kim J, Lee S. Fully deep blind image quality predictor[J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(1): 206-220. [74]Farabet C, Couprie C, Najman L, et al. Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1915-1929. [75]Bell S, Lawrence Zitnick C, Bala K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition, 2016: 2874-2883. [76]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Thirty-first Conference on Neural Information Processing Systems. NIPS, 2017. [77]Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Springer, Cham, 2020: 213-229. [78]Chen H, Wang Y, Guo T, et al. Pre-trained image processing transformer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021: 12299-12310. [79]Chen M, Radford A, Child R, et al. Generative pretraining from pixels[C]//International Conference on Machine Learning. PMLR, 2020: 1691-1703. [80]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//Ninth International Conference on Learning Representations. ICLR, 2021. [81]Keys R. Cubic convolution interpolation for digital image processing[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(6): 1153-1160. [82]Bosse S, Maniry D, Wiegand T, et al. A deep neural network for image quality assessment[C]//2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016: 3773-3777. [83]Varga D, Saupe D, Szirányi T. DeepRN: A content preserving deep architecture for blind image quality assessment[C]//2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2018: 1-6. [84]Bianco S, Celona L, Napoletano P, et al. On the use of deep learning for blind image quality assessment[J]. Signal, Image and Video Processing, 2018, 12(2): 355-362. [85]Dendi S, Dev C, Kothari N, et al. Generating image distortion maps using convolutional autoencoders with application to no reference image quality assessment[J]. IEEE Signal Processing Letters, 2018, 26:89-93. [86]Li D, Jiang T, Lin W, et al. Which has better visual quality: the clear blue sky or a blurry animal?[J]. IEEE Transactions on Multimedia, 2019, 21(5): 1221-1234. [87]Zeng H, Zhang L, Bovik A C. A probabilistic quality representation approach to deep blind image quality prediction[J]. arXiv preprint arXiv:1708.08190, 2017. [88]Liu Z, Wang J, Gong S, et al. Deep reinforcement active learning for human-in-the-loop person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6122-6131. [89]Tan M, Le Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning. PMLR, 2019: 6105-6114. [90]Zhu H, Li L, Wu J, et al. MetaIQA: deep meta-learning for no-reference image quality assessment[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020: 14143-14152. ﹀
中图分类号：	11
开放日期：	2022-02-19

附件下载