- 无标题文档
查看论文信息

中文题名:

 文献视角下的领域热点主题识别与学科交叉特性研究-以数字人文为例    

姓名:

 王若婷    

学号:

 20061212371    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 1205    

学科名称:

 管理学 - 信息资源管理    

学生类型:

 硕士    

学位:

 管理学硕士    

学校:

 西安电子科技大学    

院系:

 经济与管理学院    

专业:

 图书情报与档案管理    

研究方向:

 网络信息资源利用    

第一导师姓名:

 李慧    

第一导师单位:

 经济与管理学院    

完成日期:

 2023-03-27    

答辩日期:

 2023-05-26    

外文题名:

 Research on Identification of Hotspots and Interdisciplinary Characteristics in the Field from the Perspective of Literature: A Case Study of the Digital Humanities Field    

中文关键词:

 研究热点 ; 双模网络 ; 聚类算法 ; 数字人文 ; 学科交叉测度    

外文关键词:

 research hotspots ; two-mode network ; clustering algorithm ; digital humanities ; interdisciplinary measurement    

中文摘要:

近年,随着人工智能、云计算等技术的推动,新一轮科技革命和产业变革蓬勃兴起,各国围绕科技制高点的竞争空前激烈,科技成果产出呈井喷式增长,领域热点识别的研究,有助于学者们准确把握学科发展趋势和前沿问题,为科研政策和人才培养提供理论依据。如果想要攻克重大科学难题,实现颠覆性创新,往往需要融合多学科知识,因此识别学科间的交叉融合特征,有助于了解学科交叉发展态势,揭示新的学科增长点。本论文以数字人文领域为研究对象,系统地探究了基于文献-关键词双模网络的领域热点主题识别方法和领域学科交叉特性研究方法,主要研究工作如下:

本文从弥补共现、引用、语义三类热点识别方法不足的角度出发,提出一种基于文献-关键词双模网络的领域热点主题识别方法。首先,以科学文献作为数据来源,构建文献—关键词双模网络,提出考虑文献影响力、出版时间、关键词位置顺序、关键词词频和文献与关键词关联关系的关键词综合影响力模型。其次,在关键词共现层引入Node2vec网络表示学习模型实现节点的语义向量化表示;再次,利用轮廓系数进行K-means、凝聚层次聚类、DBSCAN、HDBSCAN四种聚类算法效果评估,确定最优的主题识别算法;最后,依据帕累托法则选择关键词综合影响力进行主题热点词的筛选,完成最终的热点识别。实验识别的数字人文领域主题与领域顶级会议给出的热点主题具有较高的一致性,表明本文所提热点识别框架能有效识别领域的热点主题。

其次,本文针对利用单一维度的测度指标进行学科交叉特征识别带来的准确度低问题,提出了立足宏观和微观两个层面探究领域学科交叉测度的研究方法。在宏观层面,设计学科多样性、持久性、稳定性指标来衡量领域内不同学科交叉的深度和广度,使用Logistic曲线来进行发展阶段的划分,并利用文献引文关系构建学科引用网络,计算不同发展阶段的网络分析指标来揭示学科之间的交叉结构和角色变化。在微观层面,基于第三章识别出的领域主题,建立主题-热点词-文献-学科映射表,计算学科丰富性、均衡性、差异性和综合多样性指标来衡量主题的学科多样性特征,构建学科共被引网络,计算平均路径长度和网络密度网络凝聚性指标以揭示主题的学科交叉结构,利用社区发现算法划分主题的学科子群,引入二维象限图对领域主题的多样性和凝聚性进行综合分析,实现对于主题学科交叉程度的测度。通过对数字人文领域文献数据展开分析发现,本文所提指标体系不仅能对学科领域整体和各研究主题的学科交叉进行测度,还能通过多样性和凝聚性特征识别最具研究潜力的学科交叉主题。

外文摘要:

In recent years, with the promotion of technologies such as artificial intelligence and cloud computing, a new round of scientific and technological revolution and industrial change have accelerated. The competition between countries around science and technology systems is unprecedentedly fierce, and the output of scientific and technological achievements grows in a well. Research on the hotspots identification of scientific literature can help researchers accurately grasp the development course and research frontiers, and provide theoretical basis for scientific research policy and personnel training. If researchers want to overcome significant scientific problems and achieve disruptive innovation, they need to integrate multidisciplinary knowledge. Therefore, identifying the cross -integration characteristics of disciplines can help understand the development trend of interdisciplinary and reveal the growth point of new disciplines. Taking the digital humanities field as the research object, this paper systematically explores the identification method of hotspots based on document-keyword two-mode network and the research method of interdisciplinary characteristics. The main research work is as follows:

In order to make up for the shortcomings of co-occurrence, citation and semantic methods, this paper proposes hot topic detection method based on document-keyword two-mode network. Firstly, taking scientific literature as the data source, this study designs a comprehensive keyword influence model that considers literature influence, time factors, document citation relationship, keyword position sequence, keyword frequency, and the relationship between documents and keywords, by adopting the document-keyword two-mode network. The nodes in the co-occurrence network are mapped to vectors by the Node2vec algorithm, and then the most appropriate clustering algorithm is selected by using Silhouette Coefficient to evaluate the four clustering algorithms, such as k-means, agglomerative hierarchical clustering, DBSCAN, HDBSAN. At last, according to the Pareto rule, the hotspots identification is carried out by comprehensive keyword influence based on the clustering results. The topics in the field of digital humanities identified by the experiment are consistent with the hot topics given by the top conferences in the field, which indicates that the hot topic identification framework proposed in this paper can effectively identify the hot topics in the field.

Aiming at the low accuracy caused by the identification of interdisciplinary features using single-dimensional measurement indicators, this paper proposes a research method for interdisciplinary measurement based on macro and micro levels. At the macro level, this paper designs the diversity, durability and stability indicators of disciplines to measure the depth and breadth of the intersection of different disciplines in the field, and uses the logistic curve to divide the development stages, and constructs the discipline citation network based on the literature citation relationship, and calculates the network analysis indicators at different development stages to reveal the intersection structure and role changes between disciplines. At the micro level, based on the domain themes identified in Chapter 3, this paper establishes a theme-hot word-literature-discipline mapping table, and calculates the subject richness, balance, difference and comprehensive diversity indicators to measure the subject diversity characteristics of the theme. In this paper, a discipline co-citation network is constructed to calculate the average path length and network density network cohesion index to reveal the interdisciplinary structure within the theme. Using the community discovery algorithm to divide the subject subgroups, this study introduces a two-dimensional quadrant chart to comprehensively analyze the diversity and cohesion dimensions of domain themes, so as to measure the degree of intersection of subject disciplines. Through the analysis of literature data in the field of digital humanities, it is found that the index system proposed in this paper can not only measure the interdisciplinary intersection of the subject field as a whole and each research topic, but also identify the interdisciplinary topics with the most research potential through diversity and cohesive characteristics.

参考文献:
[1] 李立国. 国家发展与交叉学科建设的新使命[J]. 北京社会科学, 2023(01): 87-90.
[2] 金波, 杨鹏, 王毅. “十四五”图书馆、情报与文献学学科发展态势与前瞻[J]. 图书馆杂志, 2022, 41(01):4-16.
[3] Schreibman S, Siemens R, Unsworth J. A companion to digital humanities[M]// A Companion to Digital Humanities. Blackwell, 2004.
[4] Roth C. Digital, digitized, and numerical humanities[J]. Digital Scholarship in the Humanities, 2019, 34(3).
[5] Gossett K. Day of DH:defing the digital humanities [M]//MATTHEW K G. Debates in the digital humanities [M].Minneapolis:University of Minnesota Press, 2012:67-69.
[6] Mark, Tebeau. Engaging the Materiality of the Archive in the Digital Age[J]. Collections, 2016, 12(4):475-487.
[7] Svensson, Patrik. The Landscape of Digital Humanities. Digital Humanities Quarterly. 2010.
[8] Kathleen Fitzpatrick.The Humanities,Done Digitally [EB /OL].https: / /dhdebates.gc. cuny.edu /read /untitled-88c11800-9446-469b-a3be-3fdb36bfbd1e /section /65e208fc-a5e6-479f-9a47-d51cd9c35e84#ch02,2022-12-30.
[9] Su F, Zhang Y, Immel Z. Digital humanities research: interdisciplinary collaborations, themes and implications to library and information science[J]. Journal of Documentation, 2020, ahead-of-print(ahead-of-print).
[10] Luhmann J , Burghardt M.Digital humanities—A discipline in its own right? An analysis of the role and position of digital humanities in the academic landscape. 2022.
[11] 沈旺, 刘嘉宇, 李贺, 李世钰, 张承坤. 学科领域生命周期视域下国内中医古籍数字化研究进展[J/OL]. 图书情报工作: 1-12[2023-02-04].
[12] 李晓敏, 王昊, 李跃艳, 赵萌. 数字人文视域下中国行政区划地名演化知识库构建及分析研究[J]. 数据分析与知识发现, 2022, 6(11): 139-153.
[13] 邱建华.Omeka系统在数字人文研究中的应用剖析[J].情报探索,2019(10): 104-109.
[14] 李明杰, 杨璐嘉.基于GIS的明代古籍版刻地理信息系统的设计与实现[J].信息资源管理学报,2020,10(03): 125-133.DOI:10.13365/j.jirm.2020.03.125.
[15] 潘威, 夏翠娟, 张光伟, 孙涛. 历史地理信息化与图情研究融合的必要性与可行性——以“数字历史黄河”为中心的考察[J].图书情报知识, 2021, 38(03): 37+50-60.
[16] 于晶. 基于社会化问答社区涌现模式分析的领域热点识别研究[J]. 情报学报, 2021, 40(02): 213-222.
[17] Hu J, Yin Z. Research patterns and trends of Recommendation System in China using co-word analysis[J]. Information Processing & Management, 2015, 51(4):329-339.
[18] Khasseh A A, Soheili F, Moghaddam H S, et al. Intellectual structure of knowledge in iMetrics: A co-word analysis[J]. Information Processing & Management, 2017.
[19] 丁晟春, 王楠, 吴靓婵媛. 基于关键词共现和社区发现的微博热点主题识别研究[J]. 现代情报, 2018, 38(03): 10-18.
[20] 李海林, 邬先利.基于时间序列聚类的主题发现与演化分析研究[J]. 情报学报, 2019, 38(10): 1041-1050.
[21] 吕鹏辉, 李晶晶, 杨善林.科学创新视角下的学科共词网络演化研究[J].情报学报, 2016, 35(11): 1165-1172.
[22] 奉国和, 孔泳欣.基于时间加权关键词词频分析的学科热点研究[J].情报学报, 2020, 39(01): 100-110.
[23] 荣国阳, 李长玲, 范晴晴, 郭凤娇.主题热度加速度指数——学科研究热点识别新方法[J/OL].图书情报工作:1-9[2021-11-25].https://doi.org/10.13266/j.issn.0252-3116.2021.20.007.
[24] Chumachenko A V, Kreminskyi B G, Mosenkis I L, et al. Dynamics of topic formation and quantitative analysis of hot trends in physical science[J]. Scientometrics, 2020(9).
[25] Lemire D, Turney P, Zhu X, et al. Measuring Academic Influence: Not All Citations Are Equal[J]. Journal of the American Society for Information Science and Technology, 2015.
[26] Wang Z Y, Li G, Li C Y, et al. Research on the semantic-based co-word analysis[J]. Scientometrics, 2012, 90(3):855-875.
[27] 李长玲, 牌艳欣, 相富钟, 杜德慧. 改进z指数的高被引学科研究热点识别方法探讨[J]. 情报理论与实践, 2020, 43(06): 69-75+96.
[28] 莫富传, 娄策群. 高被引论文应用于研究热点识别的理论依据与路径探索[J]. 情报理论与实践, 2019, 42(04): 59-63+35.
[29] Song X, Jiang N, Li H, et al. Medical professionalism research characteristics and hotspots: a 10-year bibliometric analysis of publications from 2010 to 2019[J]. Scientometrics, 2021:1-19.
[30] Yi Y, J Luo, M Wübbenhorst. Research on political instability, uncertainty and risk during 1953–2019: a scientometric review[J]. Scientometrics, 2020, 123.
[31] 裘惠麟, 邵波. 多源数据环境下科研热点识别方法研究[J]. 图书情报工作, 2020, 64(05): 78-88.
[32] 吴菲菲, 童奕铭, 黄鲁成. 嵌入社会感知的技术热点主题识别与发展态势分析——基于微信公众平台视域[J]. 现代情报, 2020, 40(03): 47-57.
[33] 丁晟春, 俞沣洋, 李真. 网络舆情潜在热点主题识别研究[J]. 数据分析与知识发现, 2020, 4(Z1): 29-38.
[34] Hu Y H, Tai C T, Liu K E, et al. Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity[J]. Journal of Informetrics, 2020, 14(1): 101004-.
[35] 丁晟春, 刘笑迎, 李真. 融合评论影响力的网络舆情热点主题演化研究[J]. 现代情报, 2021, 41(08): 87-97.
[36] 王传毅. 优化顶层设计,分类推进交叉学科建设[J]. 北京社会科学, 2023(01): 102-105.
[37] 马费成, 张帅. 我国图书情报领域新兴交叉学科发展探析[J/OL]. 中国图书馆学报: 1-12[2023-02-09]. http: //kns. cnki. net/kcms/detail/11. 2746. G2. 20221124. 1351. 002. html
[38] 周海涛. 促进交叉学科人才培养的五维融合[J]. 北京社会科学, 2023(01): 91-93.
[39] 马锋, 李艳, 魏沙沙, 郭卉, 王渊, 吴小健, 李涤尘, 邵金友, 张谞丰, 王浩华, 吴荣谦, 张明, 刘昌, 吕毅. 医工交叉复合型医学研究生培养体系探索与实践[J]. 中国医学教育技术, 2022, 36(06): 717-722. DOI: 10. 13566/j. cnki. cmet. cn61-1317/g4. 202206016.
[40] 金薇吟. 学科交叉方法探析[J]. 科学学研究, 2006(05): 667-671. DOI: 10. 16192/j. cnki. 1003-2053. 2006. 05. 006.
[41] 周文娟. 基于诺贝尔自然科学奖的学科交叉研究[D]. 南昌大学, 2010.
[42] 韩正琪, 刘小平, 徐涵. 基于Rao-Stirling指数的学科交叉文献发现——以纳米科学与纳米技术为例[J]. 图书情报工作, 2018, 62(01): 125-131. DOI: 10. 13266/j. issn. 0252-3116. 2018. 01. 016.
[43] 张雪, 刘昊, 张志强. 不同合作模式下的学科交叉程度与文献学术影响力关系研究[J]. 情报杂志, 2021, 40(08): 164-172.
[44] 黄菡, 王晓光, 王依蒙. 复杂网络视角下的研究主题学科交叉测度研究[J]. 图书情报工作, 2022, 66(19): 99-109. DOI: 10. 13266/j. issn. 0252-3116. 2022. 19. 010.
[45] 张振青, 孙巍. 基于特征测度和PhraseLDA模型的领域学科交叉主题识别研究——以纳米技术的农业环境应用领域为例[J/OL]. 数据分析与知识发现: 1-19[2023-02-14]. http: //kns. cnki. net/kcms/detail/10. 1478. G2. 20221011. 1425. 002. html
[46] 章成志, 徐庶睿, 卢超. 利用引文内容监测多学科交叉现象的方法与实证[J]. 图书情报工作, 2016, 60(19): 108-115. DOI: 10. 13266/j. issn. 0252-3116. 2016. 19. 014.
[47] 徐庶睿, 章成志, 卢超. 利用引文内容进行主题级学科交叉类型分析[J]. 图书情报工作, 2017, 61(23): 15-24. DOI: 10. 13266/j. issn. 0252-3116. 2017. 23. 002.
[48] 张琳, 孙蓓蓓, 黄颖. 跨学科合作模式下的交叉科学测度研究——以ESI社会科学领域高被引学者为例[J]. 情报学报, 2018, 37(03): 231-242.
[49] Leydesdorff L, Rafols I. Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations[J]. Journal of Informetrics, 2011, 5(1):87-100.
[50] Shannon C E, Weaver W. The Mathematical Theory of Communication[J]. Philosophical Review, 1949, 60(3).
[51] Simpson E H. Measurement of Diversity[J]. Journal of Cardiothoracic and Vascular Anesthesia, 1997, 11( 6):812-812.
[52] Stirling A. A general framework for analysing diversity in science, technology and society[J]. Journal of the royal society interface, 2007, 4(15):707-719.
[53] Leydesdorff L, Wagner C S, Bornmann L. Interdisciplinarity as Diversity in Citation Patterns among Journals: Rao-Stirling Diversity, Relative Variety, and the Gini coefficient:, 10.48550/arXiv.1807.04115[P]. 2018.
[54] Lin, Zhang, Ronald, et al. Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account[J]. Journal of the Association for Information Science and Technology, 2016, 67(5):1257-1265.
[55] Deng S, Xia S. Mapping the interdisciplinarity in information behavior research: a quantitative study using diversity measure and co-occurrence analysis[J]. Scientometrics, 2020, 124(4).
[56] Rafols I, Meyer M. Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience[J]. Scientometrics, 2009, 82(2):263-287.
[57] 李长玲, 纪雪梅, 支岭. 基于E-I指数的学科交叉程度分析——以情报学等5个学科为例[J]. 图书情报工作, 2011, 55(16): 33-36.
[58] Abramo G, D'Angelo C A, Zhang L. A comparison of two approaches for measuring interdisciplinary research output: the disciplinary diversity of authors vs the disciplinary diversity of the reference list:, 10.1016/j.joi.2018.09.001[P]. 2021.
[59] Wagner C S, Roessner J D, Bobb K, et al. Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature[J]. Journal of informetrics, 2011(1).
[60] Porter A L, Cohen A S, Roessner J D, et al. Measuring researcher interdisciplinarity[J]. Scientometrics: An International Journal for All Quantitative Aspects of the Science of Science Policy, 2007(1):72.
[61] 乌家培. 交叉学科发展的原因和途径[M]. 北京:光明日报出版社, 1986: 37
[62] Salazar M, Lant T K. Facilitating Innovation in Interdisciplinary Science Teams: The Role of Intergroup Leadership[J]. Academy of Management Annual Meeting Proceedings, 2013, 2013(1):15125-15125.
[63] 熊文靓, 付慧真. 交叉科学测度理论、进展与展望[J]. 图书情报工作,2022,66(21):132-144.
[64] Likas A,Vlassis M,Verbeek J. The global K-means clustering algorithm[J].Pattern Recognition,2003,(02):451-461.
[65] 张佩, 游晓明, 刘升. 融合动态层次聚类和邻域区间重组的蚁群算法[J/OL]. 计算机应用研究:1-10[2023-03-03].
[66] Sembiring R W, Zain J M, Embong A. A Comparative Agglomerative Hierarchical Clustering Method to Cluster Implemented Course[J]. Journal of Computing, 2011.
[67] Lian D, Xiong D, Lee J, et al. A Local Density Based Spatial Clustering Algorithm with Noise[C]// Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on. IEEE, 2006.
[68] Campello R, D Moulavi, Sander J. Density-based clustering based on hierarchical density estimates[C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, 2013.
[69] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014: 701-710.
[70] Grover A, Leskovec J. Node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016: 855-864.
[71] ROUSSEEUW P J. Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of Computational and Applied Mathematics,1987,20:53-65.
[72] LI Zhepeng,GE Yong. What will be popular next? Predicting hotspots in two-mode social networks[J]. MIS Quarterly,2021:45(2):925-966.
[73] YAN Erjia,DING Ying. Weighted citation: an indicator of an article's prestige[J]. Journal of the American Society for Information Science and Technology,2010,61(8):1635-1643.
[74] 李海林, 万校基, 林春培.基于关键词重要性和近邻传播聚类的主题分析研究[J].情报学报, 2018, 37(5):533-542.
[75] 娄策群. 社会科学评价的文献计量理论与方法[M]. 武汉:华中师范大学出版社, 1999.
[76] 张卫, 王昊, 邓三鸿等. 电子政务领域中文术语层次关系识别研究[J]. 情报学报, 2021, 40(1): 62-76.
[77] 许鑫, 陈路遥, 杨佳颖. 数字人文研究领域的知识图谱构建与分析——基于WoS文献关键词和引文上下文的实证[J]. 图书情报工作, 2019, 63(7):86-95.
[78] 于杨, 金玥.《情报科学》的文献计量研究:热点主题与知识基础[J]. 情报科学, 2019, 37(9): 126-132.
[79] 霍朝光, 魏瑞斌, 张斌. 基于PageRank和Node2vec的研究热点与集群发现——以国际深度学习研究领域为例[J]. 情报杂志, 2020, 39(8): 174-179, 153.
[80] 邱均平, 宋艳辉.引文分析领域研究热点前沿与高频作者的二维时空分析[J].图书情报知识, 2011(6):18-24.
[81] EA Corrêa, Silva F N, Costa L F, et al. Patterns of authors contribution in scientific manuscripts[J]. Journal of Informetrics, 2016, 11( 2):498-510.
[82] Bu Y, Ding Y, Liang X, et al. Understanding persistent scientific collaboration[J]. Journal of the Association for Information Science and Technology, 2018.
[83] Yi, Bu, Dakota, et al. Measuring the stability of scientific collaboration[J]. Scientometrics: An International Journal for All Quantitative Aspects of the Science of Science Policy, 2018, 114(2):463-479.
[84] 王曰芬, 李冬琼, 余厚强. 生命周期阶段中的科学合作网络演化及高影响力学者成长特征研究[J]. 情报学报, 2018, 37(02): 121-131.
[85] 李煜, 刘虹, 孙建军. 生命周期视角下国内外数字人文研究比较分析[J]. 图书馆杂志, 2019, 38(02): 30-40.
[86] Stirling A. On the Economics and Analysis of Diversity[J]. science policy research unit, 1998.
[87] Kendall S M, Stuart A, Ord J K. Kendall''s Advanced Theory of Statistics[J]. Journal of the Royal Statistical Society Series D (The Statistician), 1994, 43(1).
[88] Lin, Zhang, Ronald, et al. Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account[J]. Journal of the Association for Information Science and Technology, 2016, 67(5):1257-1265.
[89] Rafools I,Meyer M. Diversity and network coherence as indicators of interdisciplinarity:case studies in bionanoscience [J]. Scientometrics,2009,82(2):263-287.
中图分类号:

 G35    

馆藏号:

 56708    

开放日期:

 2023-12-25    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式