- 无标题文档
查看论文信息

中文题名:

 HEVC中的DCT硬件架构设计与HLS实现研究    

姓名:

 琚歆    

学号:

 1201120331    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 0810    

学科名称:

 信息与通信工程    

学校:

 西安电子科技大学    

院系:

 通信工程学院    

专业:

 通信与信息系统    

第一导师姓名:

 雷杰    

第一导师单位:

 西安电子科技大学    

完成日期:

 2014-12-07    

答辩日期:

 2014-12-07    

外文题名:

 DCT Hardware Architecture Design of HEVC and Research on HLS Implementation    

中文关键词:

 HEVC ; DCT ; HLS实现    

中文摘要:
ITU-T和ISO/IEC于2013年1月发布了一项新的视频压缩标准—HEVC。HEVC是继H.264之后的新一代视频压缩标准,与H.264相比,HEVC的压缩效率高出一倍,HEVC标准的出现解决了视频分辨率更高、视频数据量更大、存储和传输变得更加困难等难题。DCT变换在图像视频压缩领域的应用非常广泛,作为视频图像压缩的必要环节,DCT算法是图像视频编码算法中最活跃的研究部分之一。HEVC标准中的DCT变换是非常重要的预压缩过程,由于HEVC标准中需要对视频序列预测多个方向,然后通过压缩反馈获得最佳预测方向,因此需要多次执行DCT部分,所以DCT的高效实现显得十分重要。另外,HEVC对于整幅图像有更为灵活的分块机制,其中大尺寸的分块也会为相应尺寸的DCT变换带来不便,因此对大尺寸DCT变换的硬件实现研究显得尤为迫切。本文主要介绍了HEVC视频压缩编码流程中的DCT变换过程,并针对HEVC中DCT的大尺寸计算这一点,完成了两种DCT硬件架构的设计及HLS(High-level Synthesis)实现。本文的主要工作成果有:提出了以下两种硬件架构(1)基于矩阵相乘的DCT硬件架构及实现。根据资源利用率、处理延迟以及数据吞吐率这三个指标由HLS方法进行综合实现和优化,使得综合结果达到了5.56Gsps的数据吞吐率,满足了4K视频实时传输吞吐率。(2)基于蝶形算法的DCT硬件架构及实现。根据蝶形算法的运算原理,本文采用HLS方法对其进行了设计实现,与已有论文中DCT硬件结构实现结果进行了比较和分析。并以处理延迟以及数据吞吐率作为优化目标进行HLS优化设计,所完成的实现综合结果获得了6.77Gsps的数据吞吐率,比已有文献的实现结果相比获得了更高的数据吞吐率,且该实现可以应用于8K视频实时压缩。本文重点研究了HEVC中的整数DCT变换算法架构设计、HLS实现及其关键技术。在设计过程中解决了资源利用率过高、处理延迟过大和数据吞吐率无法达到目标值等问题;同时采用HLS方法完成了硬件实现,解决了传统硬件开发周期过长的问题,并且能够在一个软件设计上不断迭代出新的应用在不同场景中的硬件架构。最后对上述HLS实现分别完成了RTL功能仿真测试。其中,本文提出的硬件架构和HLS实现方法可广用于4K或8K分辨率视频的实时压缩。
外文摘要:
A new video compression standard, HEVC(High Efficiency Video Coding), is released by the two organizations ITU-T and ISO/IEC in January 2013. HEVC is considered to be a new video compression standard succeeded to H.264. Compared with H.264, the compression efficiency of HEVC doubles that of H.264. So HEVC can solve the compression problem well in the condition of higher video resolution, greater amount of video data and tougher storage and transmission circumstances.DCT is widely used as an essential part of the video and image compression. It has become one of the most active researche fields in the video coding algorithm. In HEVC standard, DCT plays a very important role in the pre-compression process. Multiple directions are obtained and selected as the best direction based on the feedback of DCT in the prediction process, so the time that DCT executes grows dramatically. As a sequence, low efficient implementation of DCT may become the bottleneck of video compression, thus decreasing the overall compression efficiency. Besides, HEVC has more flexible blocking mechanism for the whole image and the flexible block will then bring inconvenience to the corresponding size of DCT process, so the research on the DCT hardware implementation is particularly urgent.This paper chiefly describes the DCT coding process in HEVC. Then the design and HLS implementation of two kinds of DCT hardware architectures for large size of DCT is completed. Aimed at the requirement of real-time video transmission and compression, two kinds of hardware architectures are proposed:(1) The hardware architecture based on matrix multiplication. According to the resource utilization rate, processing delay and data throughput rate, HLS technique is applied to synthesize and optimize the implementation to makes the comprehensive results achieve the data throughput of 5.56Gsps. The result meets the data throughput of 4K real-time video transmission.(2) The realization of hardware architecture based on butterfly transform. The hardware structure implementation and optimization based on butterfly transform is realized. Compared with the existing structure, the structure this paper proposed obtains data throughput of 6.77Gsps which is higher than that in the existing paper. It also can perform well in the 8K video compression scenario.This paper mainly presents the architectures design of integer DCT and their realization in HEVC by HLS. It overcomes the difficulties of the high resource utilization rate, the large processing delay and the low data throughput not meeting the target value through a detailed analysis. The development using HLS technique solves the problem of large cycle in traditional hardware development. At the same time the HLS tool can make it possible to design different hardware structures in different scenarios with the same software design. At the end of the paper, the RTL function simulation test is completed to verify the design. The structure of HLS realization method proposed in this paper can be widely applied to real-time compression and transmission of the high resolution video.
中图分类号:

 11    

馆藏号:

 11-26834    

开放日期:

 2015-09-13    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式