- 无标题文档
查看论文信息

中文题名:

 在线学习平台的人数流量分析与预测    

姓名:

 尹静怡    

学号:

 17033211400    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 040110    

学科名称:

 教育学 - 教育学 - 教育技术学(可授教育学、理学学位)    

学生类型:

 硕士    

学位:

 教育学硕士    

学校:

 西安电子科技大学    

院系:

 计算机科学与技术学院    

专业:

 教育技术学     

研究方向:

 电子学习理论、技术与应用    

第一导师姓名:

 詹海生    

第一导师单位:

 西安电子科技大学    

完成日期:

 2020-05-01    

答辩日期:

 2020-05-24    

外文题名:

 Prediction and Analysis of the Number of People for the Online Examination Platform    

中文关键词:

 时间序列 ; 远程教育 ; 人数预测 ; LSTM ; 迁移学习    

外文关键词:

 Time Series ; Distance Education ; Population Prediction ; LSTM ; Transfer    

中文摘要:

教育在我国一直处在重中之重的位置,伴随着人工智能的发展,远程教育呈现出蓬勃的发展之势。随着选择远程教育的学生增多,在线教育平台中产生了大量的日志信息。如何合理利用在线教育平台所产生的数据,提高远程教育平台的利用率,为远程教育平台安全可靠的运行提供理论基础,成为当前的研究热点。本文以在线学习平台中的日志信息作为基础,整理出平台中每小时的在线考试人数,对在线考试人数进行预测,以达到提高系统资源的利用率和提升考试服务质量的目标。如何提升人数预测的准确性就变得尤为重要。基于此,本文主要研究内容概括如下:

首先,本文从在线学习平台的日志信息中,提取出在线考试人数的日志,并经过预处理整理出系统中每小时的实时考试人数数据,为后文的问题研究提供真实数据集。

然后,对在线考试人数数据进行差分处理将其转换为平稳时间序列,同时使用SARIMA对其建模。接着使用残差检验的方式验证了建模的有效性,又采用了网格搜索算法对SARIMA中参数进行寻优。在真实数据集上对上述模型进行对比分析,实验结果表明SARIMA模型能够有效的对未来考试人数进行预测,但其对参数设置比较敏感且参数选择存在部分主观性从而会影响模型预测效果。

其次,本文采用时间窗口技术将时间序列问题转化为有监督的回归问题,结合实际数据情况采用单层LSTM建模防止模型过于复杂而导致出现过拟合的问题。根据本文人数预测的实际情况,在输出层采用Relu激活函数以避免预测结果出现负值。通过实验验证的方式,对LSTM超参数进行调参从而最终确定LSTM模型,又在训练过程中采用“早停”技术防止因数据量过少出现模型过拟合的问题。实验结果与SARIMA模型相比,LSTM模型在测试集上的准确度提升了35%。

最后,结合在线学习平台中的场景,在线考试数据特点如下:产生的考试周期数量多,每个考试周期数据量较少,考试周期间的数据相关性较大。这些特点会导致在线考试人数预测中存在“冷启动”和训练数据量少的问题,本文分别设计了不同的迁移学习方案以解决上述问题。针对“冷启动”问题,本文采用预训练模型只训练最后的全连接层的方案提高模型准确度;针对训练数据量少的问题,本文采用预训练模型加载的方案降低模型误差。通过实验对比,采用迁移学习的LSTM模型与只采用对数据进行训练的LSTM模型相比,在测试集上分别有60%和18%的性能提升。

综上,本文设计的方案有效的解决了在线考试人数预测的问题,对合理利用服务资源和提升服务水平具有实用指导意义。

外文摘要:

Education has always been in the most important position in our country. With the development of Artificial Intelligence, distance education shows us a vigorous development trend. With the large increasing number of students who choose distance education, lots of log information has been produced in online education platform. How to reasonably use the data generated by online education platform in order to improve the utilization rate of distance education platform to provide theoretical basis for the safe and reliable operation of distance education platform has become a hot research topic. The paper is based on the number of log online information and sorts out the number in the online examination platform per hour, forecasting the online number so as to achieve the goal of improving the utilization rate of system resources and the quality of examination service. It’s particularly important how to improve the accuracy of population forecast. Based on this point, the main contents of the paper will be summarized as follows:

 

Firstly, this paper extracts the number of online examination from the log information of the online examination platform and collates the hourly real time data of the number in the system after preprocessing in order to provide the real data for the later research.

 

Secondly, the data of online examination population will be transformed into a stationary time series by differential processing, SARIMA is used to model it at the meantime. The validity of the model is verified by a residual test and the parameters in SARIMA are optimized by grid search algorithm. The experimental result shows that SARIMA model is able to effectively predict the number in the future, but it is sensitive to parameter setting and produces some subjectivity in the parameter selection which will affect the prediction effect of this model.

 

Thirdly, time window technology is used to transform the time series problem into a supervised regression problem in the paper, which uses the single-layer LSTM model to prevent the problem of over fitting caused by the model over complexity. According to the actual situation of the prediction, the Relu activation function is used in the output layer to avoid negative prediction results. The parameters of LSTM are adjusted to determine the LSTM model by the way of experimental verification and the "early stop" technology is used to prevent the model from over fitting for the data is too little in the process of training. Compared with SARIMA model, the accuracy of LSTM model is improved by 35% in the test set.

 

Finally, combined with the scene of online test platform, the characteristics of online test data are as follows: the number of test cycles is large, the number of data in each test cycle is little, the data correlation is large during the test week . These characteristics will lead to the problems of "cold start" and small amount of training data in online test population prediction. In this paper, different transfer learning schemes are designed to solve these problems. To solve the problem of "cold start", the paper uses the pre-training model to train only the last full connection layer in order to improve the model accuracy. To solve the problem of less training data, the paper uses the pre-training model loading scheme to reduce the model error. Compared with the LSTM model which only uses data for training by experimental comparison, the performance of the LSTM model with migration learning improves by 60% and 18% respectively on the test set.

 

To sum up, the scheme designed in the paper effectively solves the problem of online examination number prediction, which makes practical significance for rational use of service resources and improvement of the service level.

参考文献:
[1]龚祥国.开放大学课程体系建设的思考[J].中国远程教育,2012(8).
[2]王珠珠.教育信息化2.0:核心要义与实施建议[J].中国远程教育,2018(07):5-8.
[3]蔡梅. 网络学历教育中学习者特征体系研究[D].江南大学,2019.
[4]教育部发布2008年全国教育事业发展统计公报[J].中国远程教育,2009(08):56.
[5]Xiong X . Theory and Practice: Improving Retention Performance through Student Modeling and System Building[J]. doctoral dissertations, 2017.
[6]贾卫峰,林木兴,高华.基于MVC开发模式的在线学习互动平台设计与实现[J].软件导刊,2017,16(10):75-79.
[7]张婧婧,杨业宏,安欣.弹幕视频中的学习交互分析[J].中国远程教育,2017(11):22-30+79-80.
[8]Ioannis Doumanis,Daphne Economou,Gavin Robert Sim,Stuart Porter. The impact of multimodal collaborative virtual environments on learning: A gamified online debate[J]. Computers & Education,2019,130.
[9]Carlo De Medio,Carla Limongelli,Filippo Sciarrone et al. A recommendation system for creating courses using the moodle e-learning platform[J]. Computers in Human Behavior,2020,104.
[10]P.J. Lewis,T.M. Catanzano,L.P. Davis,S.G. Jordan. Web-based Conferencing: What Radiology Educators Need to Know[J]. Academic Radiology,2020,27(3).
[11]Williams Leslie,Martinasek Mary,Carone Katie et al. High School Students' Perceptions of Traditional and Online Health and Physical Education Courses.[J].The Journal of school health,2020,90(3).
[12]Abdellah Ibrahim Mohammed Elfeky, et al. Advance organizers in flipped classroom via e-learning management system and the promotion of integrated science process skills[J]. Thinking Skills and Creativity,2020,35.
[13]Fatma Gizem Karaoglan Yilmaz,Hafize Keser. The impact of reflective thinking activities in e-learning: A critical review of the empirical research[J]. Computers & Education,2016,95.
[14]Yule, G. U . On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer Sunspot Numbers[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 1927, 226(636-646):267-298.
[15]Walker, G. On Periodicity in Series of Related Terms[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 1931, 131(818):518-532.
[16]Slutzky E . The Summation of Random Causes as the Source of Cyclic Processes[J]. Econometrica, 1937, 5(2):105-146.
[17]Kendall M , Wold H . A Study in the Analysis of Stationary Time Series[J]. Journal of the Royal Statistical Society, 1954, 117(4):484.
[18]Box G E , Jenkins G M . Time series analysis: forecasting and control rev. ed.[J]. Oakland, California, Holden-Day, 1976, 1976, 31(4):238-242.
[19]沈凌. 基于收费数据的高速公路短时客货运输量短时预测研究[C]. 中国智能交通协会.第十四届中国智能交通年会论文集.中国智能交通协会:中国智能交通协会,2019:336-344.
[20]陈绵. 区域共享单车停放量短期预测方法研究[C]. 中国城市规划学会城市交通规划学术委员会.品质交通与协同共治——2019年中国城市交通规划年会论文集.中国城市规划学会城市交通规划学术委员会:中国城市规划设计研究院城市交通专业研究院,2019:2343-2353.
[21]Dianting Liu. Application of Data Analysis in Trend Prediction of Different Crime Types in London[C]. ICPCSEE Steering Committee.Abstracts of the 5th International Conference of Pioneering Computer Scientists,Engineers and Educators(ICPCSEE 2019)Part II.ICPCSEE Steering Committee:中科国鼎数据科学研究院(北京)有限公司,2019:61.
[22]Tao Ma,Constantinos Antoniou,Tomer Toledo. Hybrid machine learning algorithm and statistical time series model for network-wide traffic forecast[J]. Transportation Research Part C,2020,111.
[23]Yaling Ma. Research on Language Distribution Prediction Based on ARIMA Model[C]. Institute of Management Science and Industrial Engineering.Proceedings of 2019 6th International Conference on Machinery,Mechanics,Materials,and Computer Engineering(MMMCE 2019).Institute of Management Science and Industrial Engineering:计算机科学与电子技术国际学会(Computer Science and Electronic Technology International Society),2019:800-808.
[24]Mohammed Al Shehhi,Andreas Karathanasopoulos. Forecasting hotel room prices in selected GCC cities using deep learning[J]. Journal of Hospitality and Tourism Management,2020,42.
[25]Miguel Becerra,Alejandro Jerez,Bastián Aballay et al. Forecasting emergency admissions due to respiratory diseases in high variability scenarios using time series: A case study in Chile[J]. Science of the Total Environment,2020,706.
[26]Huan Liu,Chenxi Li,Yingqi et. Forecast of the trend in incidence of acute hemorrhagic conjunctivitis in China from 2011–2019 using the Seasonal Autoregressive Integrated Moving Average (SARIMA) and Exponential Smoothing (ETS) models[J]. Journal of Infection and Public Health,2020,13(2).
[27]Sulyok Mihály,Richter Hardy,Sulyok Zita,et al. Predicting tick-borne encephalitis using Google Trends.[J]. Ticks and tick-borne diseases,2020,11(1).
[28]Cong Jing,Ren Mengmeng,Xie Shuyang,et al. Predicting Seasonal Influenza Based on SARIMA Model, in Mainland China from 2005 to 2018.[J]. International journal of environmental research and public health,2019,16(23).
[29]Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[30]Omer Berat Sezer,M. Ugur Gudelek,Ahmet Murat Ozbayoglu. Financial time series forecasting with deep learning : A systematic literature review: 2005–2019[J]. Applied Soft Computing Journal,2020.
[31]Yun Geun Young,Ngarambe Jack,Duhirwe Patrick Nzivugira, et al. Predicting the magnitude and the characteristics of the urban heat island in coastal cities in the proximity of desert landforms. The case of Sydney.[J]. The Science of the total environment,2020,709.
[32]Tangbin Xia,Ya Song,Yu et al. An ensemble framework based on convolutional bi-directional LSTM with multiple time windows for remaining useful life estimation[J]. Computers in Industry,2020,115.
[33]Qinglong An, Zhengrui Tao, Xingwei Xu, et al. A data-driven model for milling tool remaining useful life prediction with convolutional and stacked LSTM network[J]. Measurement,2020,154.
[34]Toon Bogaerts,Antonio D. Masegosa,Juan S. Angarita-Zapata, et al. A graph CNN-LSTM neural network for short and long-term traffic forecasting based on trajectory data[J]. Transportation Research Part C,2020,112.
[35]Qi Feng,Chenqiang Gao,Lan Wang, et al. Spatio-temporal fall event detection in complex scenes using attention guided LSTM[J]. Pattern Recognition Letters,2020,130.
[36]Zhang WenJie,Qin Jian,MEI Feng, et al. Short-term power load forecasting using integrated methods based on long short-term memory[J].Science China Technological Sciences:1-112020-03-07.
[37]Yu Liu,Shuting Dong,Mingming Lu,Jianxin Wang.LSTM Based Reserve Prediction for Bank Outlets[J].Tsinghua Science and Technology,2019,24(01):77-85.
[38]Akaike H. A new look at the statistical model identification[J]. IEEE transactions on automatic control, 1974, 19(6): 716-723.
[39]Schwarz G. Estimating the dimension of a model[J]. The annals of statistics, 1978, 6(2): 461-464.
[40]Berndt D J, Clifford J. Using dynamic time warping to find patterns in time series[C]//KDD workshop. 1994, 10(16): 359-370.
[41]Pan S J, Yang Q.A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2010, 22(10):1345-1359.
中图分类号:

 G40    

馆藏号:

 45582    

开放日期:

 2020-12-17    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式