- 无标题文档
查看论文信息

中文题名:

 基于知识图谱结构信息与文本语义信息的科技文献推荐方法研究    

姓名:

 林松涛    

学号:

 20061212377    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 125500    

学科名称:

 管理学 - 图书情报* - 图书情报    

学生类型:

 硕士    

学位:

 管理学硕士    

学校:

 西安电子科技大学    

院系:

 经济与管理学院    

专业:

 图书情报与档案管理    

研究方向:

 数据质量、数据集成    

第一导师姓名:

 宗威    

第一导师单位:

  西安电子科技大学    

完成日期:

 2023-05-24    

答辩日期:

 2023-05-27    

外文题名:

 Research on Scientific and Technological Literature Recommendation Method Based on Knowledge Graph Structure Information and Text Semantic Information    

中文关键词:

 科技文献推荐 ; 知识图谱 ; 图嵌入 ; 高阶结构信息 ; 语义信息提取    

外文关键词:

 Scientific and Technical Literature Recommendation ; Knowledge Graph ; Graph Embedding ; High-order Structural Information ; Semantic Information Extraction    

中文摘要:

       随着科学研究工作的不断发展,科研活动产生的相应科技文献资源数量呈现出指数增长态势,海量文献资源在为科研活动提供有力支撑的同时也让科研人员面临着信息过载的难题,这使得搜寻与获取符合需求的文献资源需要花费大量时间与精力,科技文献推荐是解决文献资源过载问题的有效方法之一。但是,科技文献不同于电影、商品等传统推荐对象,其推荐过程存在推荐目的多样、数据来源丰富、目标用户特殊等特点,因此,如何为科研人员精准推荐符合其需求的文献资源是值得研究的科学问题。传统的科技文献推荐大多是基于协同过滤或基于内容的模式,常常存在数据稀疏、推荐结果缺乏解释性、信息挖掘不够充分等缺陷,针对这一状况,本文提出了基于知识图谱高阶结构信息与科技文献深层语义信息实现科技文献精准推荐的方法,主要研究工作如下:

       首先,针对传统基于协同过滤的推荐方法存在推荐依据单一、结构信息挖掘不充分等问题,提出了基于知识图谱提取高阶结构信息并进行推荐的科技文献推荐模型,该模型将由内向外扩散的水波模型与由外向内聚合的知识图谱卷积模型进行整合,在考虑用户对于文献关联属性看重程度的同时聚合图谱结构中的多跳邻居信息,最终生成包含用户偏好特征与文献特征的科技文献结构表征用于推荐。其次,对于当前基于内容的推荐方法无法深层提取语义内容、推荐效果不佳等问题,本文应用SBERT-PR语义提取模型提取科技文献文本深层语义信息,该模型通过应用基础BERT模型提取蕴含在科技文献标题、关键词、摘要中的深层语义信息,同时借助孪生网络共享训练参数,改善传统BERT模型对于语句级文本语义表征不佳的缺陷,最终生成包含科技文献深层语义信息的科技文献语义表征用于推荐。最后,在科技文献推荐过程中,本文将高阶结构信息与深层语义信息相结合,提出了融合结构表征与语义表征的科技文献推荐方法,通过计算基准文献与目标文献的综合表征相似程度,实现科技文献推荐。

       本文将所提出的方法在真实文献数据集DBLP与WOS上进行验证,实验结果表明,本文所提方法的推荐效果显著优于现有传统方法。此外,本文还进行了消融实验,证实了综合考虑结构与语义信息的科技文献推荐方法在推荐效果上明显优于基于单一信息源的推荐方法,为科研人员精准推荐科技文献提供了方法支撑与借鉴。

外文摘要:

As scientific research work continues to advance, the corresponding quantity of scientific and technological literature resources has shown an exponential growth trend. While these massive literature resources provide strong support for research activities, they also present researchers with the challenge of information overload. Searching and obtaining relevant literature resources that meet their needs require a significant amount of time and effort. Technological literature recommendation is an effective method to address the problem of literature overload. However, unlike traditional recommendation objects such as movies or products, technological literature recommendation involves diverse recommendation purposes, abundant data sources, and specific target users. Therefore, it is worth studying how to accurately recommend literature resources that meet the needs of researchers. Traditional technological literature recommendation methods are mostly based on collaborative filtering or content-based models, often suffering from issues such as data sparsity, lack of explanatory recommendation results, and insufficient information mining. In response to this situation, this study proposes a method for precise technological literature recommendation by utilizing high-order structural information from knowledge graphs and deep semantic information from the literature. The main research work includes:

 

Firstly, to address the limitations of traditional collaborative filtering methods such as single recommendation criteria and insufficient mining of structural information, a technological literature recommendation model based on high-order structural information from knowledge graphs is proposed. This model integrates the diffusion wave model that spreads from the inside out and the knowledge graph convolutional network model that aggregates information from the outside in. It considers the importance of literature-related attributes as perceived by users and aggregates multi-hop neighbor information from the graph structure to generate a structural representation of technological literature that includes user preference features and literature features for recommendation. Secondly, considering the limitations of current content-based recommendation methods in deep semantic content extraction and suboptimal recommendation performance, this study applies the SBERT-PR semantic extraction model to extract deep semantic information from technological literature texts. The model utilizes the basic BERT model to extract deep semantic information embedded in the titles, keywords, and abstracts of technological literature. By leveraging siamese network training parameters, it improves the deficiency of traditional BERT models in representing sentence-level text semantics. Ultimately, a semantic representation of technological literature is generated, which includes deep semantic information for recommendation. Finally, in the process of technological literature recommendation, this study combines high-order structural information with deep semantic information and proposes a fusion approach that integrates structural and semantic representations for technological literature recommendation. By calculating the similarity between benchmark literature and target literature based on their comprehensive representations, the recommendation of technological literature is achieved.

 

The proposed method is validated on real literature datasets, DBLP and WOS. Experimental results demonstrate that the recommended effectiveness of the proposed method is significantly better than existing traditional methods. Furthermore, ablation experiments confirm that the technological literature recommendation method, which comprehensively considers both structural and semantic information, outperforms recommendation methods based on a single information source. This study provides methodological support and reference for accurately recommending technological literature to researchers.

参考文献:
[1] 程学旗, 梅宏, 赵伟等. 数据科学与计算智能:内涵、范式与机遇[J]. 中国科学院院刊, 2020, 35(12): 1470-1481.
[2] PRICE D J D S. Networks of scientific papers[J]. Science, 1965, 149: 510-515.
[3] ALI Z, KEFALAS P, MUHAMMAD K, et al. Deep learning in citation recommendation models survey[J]. Expert Systems with Applications, 2020, 16: 113790
[4] 刘旭晖. 融合主题多样性与影响力的科技文献推荐算法研究[J]. 情报理论与实践, 2017, 40(12): 134-138.
[5] 李雅, 鲁玉妙, 张歆杰等. 基于科研人员需求的文献检索质量量表构建[J]. 科技管理研究,2018,38(08):173-179.
[6] KO H, LEE S, PARK Y, et al. A survey of recommendation systems: recommendation models, techniques, and application fields[J]. Electronics, 2022, 11(1): 141.
[7] 赵俊逸, 庄福振, 敖翔等. 协同过滤推荐系统综述[J]. 信息安全学报, 2021, 6(5): 17-34.
[8] 刘华玲, 马俊, 张国祥. 基于深度学习的内容推荐算法研究综述[J]. 计算机工程, 2021, 47(7): 1-12.
[9] 朱冬亮, 文奕, 万子琛. 基于知识图谱的推荐系统研究综述[J]. 数据分析与知识发现, 2021, 5(12): 1-13.
[10] 解男男, 胡亮, 努尔布力等. 基于Web日志挖掘的网页推荐方法[J]. 吉林大学学报(理学版),2013,51(2):267-272. DOI:10.7694/jdxblxb20130224.
[11] 程秀峰, 张孜铭. 基于情境感知的电商平台推荐系统框架研究[J]. 情报理论与实践, 2021, 44(2): 168-177.
[12] 赵俊逸, 庄福振, 敖翔等. 协同过滤推荐系统综述[J]. 信息安全报, 2021, 6(05): 17-34.
[13] 陈海涛, 宋姗姗, 李同强. 基于用户的改进的协同过滤推荐算法[J]. 情报理论与实践, 2015, 38(09): 100-103+133.
[14] WANG R, WU Z, LOU J, et al. Attention-based dynamic user modeling and deep collaborative filtering recommendation[J]. Expert Systems with Applications, 2022, 188: 116036.
[15] 王战平, 夏榕. 基于社会化标签挖掘的微博内容推荐方法研究[J]. 情报科学, 2021, 39(05): 91-96.
[16] 耿立校, 晋高杰, 李亚函等. 基于改进内容过滤算法的高校图书馆文献资源个性化推荐研究[J]. 图书情报工作, 2018,62(21):112-117.
[17] 刘远晨. 基于知识的推荐系统综述[J]. 计算机时代, 2022(4): 13-16, 20.
[18] 王颖纯, 董雪敏, 刘燕权. 基于知识挖掘的图书馆智慧推荐服务模式[J].图书馆学研究, 2018, No.428(09): 37-43.
[19] LI H, HAN D. A novel time-aware hybrid recommendation scheme combining user feedback and collaborative filtering[J]. Mobile Information Systems, 2020, 2020: 1-16.
[20] TRABELSI F Z, KHTIRA A, EL ASRI B. Hybrid recommendation systems: a state of art[J]. ENASE, 2021: 281-288.
[21] WANG D, YIH Y, VENTRESCA M. Improving neighbor-based collaborative filtering by using a hybrid similarity measurement[J]. Expert Systems with Applications, 2020, 160: 113651.
[22] LOEPP B, DONKERS T, KLEEMANN T, et al. Interactive recommending with tag-enhanced matrix factorization (TagMF)[J]. International Journal of Human-Computer Studies, 2019, 121: 21-41.
[23] MARTINS G B, PAPA J P, ADELI H. Deep learning techniques for recommender systems based on collaborative filtering[J]. Expert Systems, 2020, 37(6): e12647.
[24] JAVED U, SHAUKAT K, HAMEED I A, et al. A review of content-based and context-based recommendation systems[J]. International Journal of Emerging Technologies in Learning (iJET), 2021, 16(3): 274-306.
[25] 姜书浩, 薛福亮. 一种利用协同过滤预测和模糊相似性改进的基于内容的推荐方法[J]. 现代图书情报技术, 2014(02): 41-47.
[26] 谢振平, 金晨, 刘渊. 基于建构主义学习理论的个性化知识推荐模型[J]. 计算机研究与发展, 2018, 55(1): 125-138.
[27] 冉从敬, 宋凯. 基于混合方法的高校专利个性化推荐模型构建[J]. 情报理论与实践, 2020, 43(10): 93-98.
[28] 胡代平, 唐铭, 徐博艺. 基于读者偏好变化的高校图书个性化推荐方法[J]. 系统管理学报, 2020, 29(4): 824-829.
[29] 陈柳,郭宇红.融合motif结构高阶相似度的文献推荐算法[J/OL].数据分析与知识发现:1-15[2023-03-18].http://kns.cnki.net/kcms/detail/10.1478.G2.20221223.1511.004.html.
[30] WANG W, TANG T, XIA F, et al. Collaborative filtering with network representation learning for citation recommendation[J]. IEEE Transactions on Big Data, 2020, 8(5): 1233-1246.
[31] 马鑫, 王芳. 融合类目偏好和数据场聚类的协同过滤推荐算法研究[J]. 现代情报, 2023, 43(01): 6-18.
[32] 谭晓, 张志强. 知识图谱研究进展及其前沿主题分析[J]. 图书与情报, 2020, No.192(02): 50-63.
[33] 阮光册, 樊宇航, 夏磊.知识图谱在实体检索中的应用研究综述[J]. 图书情报工作, 2020, 64(14): 126-135.
[34] ZHAO X, CHEN H, XING Z, et al. Brain-inspired search engine assistant based on knowledge graph[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021,1-15.
[35] 张云中, 郭冬, 王亚鸽等. 基于知识图谱的红色历史人物知识问答服务框架研究[J]. 图书情报工作, 2021, 65(16): 108-117.
[36] 赵浩宇, 陈登建, 曾桢等. 基于知识图谱的中国近代史知识问答系统构建研究[J]. 数字图书馆论坛, 2022, No.217(06): 31-38.
[37] GUO Q, ZHUANG F, QIN C, et al. A survey on knowledge graph-based recommender systems[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(8): 3549-3568.
[38] 朱冬亮, 文奕, 万子琛. 基于知识图谱的推荐系统研究综述[J]. 数据分析与知识发现, 2021,5(12): 1-13.
[39] 陈源毅, 冯文龙, 黄梦醒等. 基于知识图谱的行为路径协同过滤推荐算法[J]. 计算机科学, 2021, 48(11): 176-183.
[40] HUI B, ZHANG L, ZHOU X, et al. Personalized recommendation system based on knowledge embedding and historical behavior[J]. Applied Intelligence, 2022, 52: 954-966.
[41] 秦川, 祝恒书, 庄福振等. 基于知识图谱的推荐系统研究综述[J]. 中国科学:信息科学, 2020, 50(07): 937-956.
[42] LIN Y, XU B, FENG J, et al. Knowledge-enhanced recommendation using item embedding and path attention[J]. Knowledge-Based Systems, 2021, 233: 107484.
[43] YU X, REN X, SUN Y, et al. Recommendation in heterogeneous information networks with implicit user feedback[C]// Proceedings of the 7th ACM Conference on Recommender Systems. ACM, 2013: 347–350.
[44] HU B, SHI C, ZHAO W X, et al. Leveraging meta-path based context for top-n recommendation with a neural co-attention model[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018: 1531–1540
[45] HUANG J, ZHAO W X, DOU H, et al. Improving sequential recommendation with knowledge-enhanced memory networks[C]// Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 2018. 505–514
[46] ALI Z, QI G, KEFALAS P, et al. A graph-based taxonomy of citation recommendation models[J]. Artificial Intelligence Review, 2020, 53: 5217-5260.
[47] 丁恒, 任卫强, 曹高辉. 基于无监督图神经网络的学术文献表示学习研究[J]. 情报学报, 2022, 41(01): 62-72.
[48] ZHANG J, GU F, JI Y, et al. Personalized scientific and technological literature resources recommendation based on deep learning[J]. Journal of Intelligent & Fuzzy Systems, 2021, 41(2): 2981-2996.
[49] DAI T, ZHU L, WANG Y, et al. Attentive stacked denoising autoencoder with bi-lstm for personalized context-aware citation recommendation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 28: 553-568.
[50] 张麒麟, 姜霖. 基于文献内容的图书推荐机制研究[J]. 图书馆学研究, 2018(01): 78-81+17.
[51] 熊回香, 孟璇, 叶佳鑫. 基于关键词语义类型和文献老化的学术论文推荐[J]. 现代情报, 2021, 41(01):13-23.
[52] GÜNDOĞAN E, KAYA M. A novel hybrid paper recommendation system using deep learning[J]. Scientometrics, 2022, 127(7): 3837-3855.
[53] YANG N, JO J, JEON M, et al. Semantic and explainable research-related recommendation system based on semi-supervised methodology using BERT and LDA models[J]. Expert Systems with Applications, 2022, 190: 116209.
[54] HUANG Y, WANG H, WANG R. Deep learning recommendation algorithm based on semantic mining[J]. Plos one, 2022, 17(9): e0274940.
[55] 王妍, 唐杰. 基于深度学习的论文个性化推荐算法[J]. 中文信息学报, 2018, 32(04): 114-119.
[56] YE B K, TU Y J T, LIANG T P. A hybrid system for personalized content recommendation[J]. Journal of Electronic Commerce Research, 2019, 20(2): 91-104.
[57] ZHAO X, KANG H, FENG T, et al. A hybrid model based on LFM and BiGRU toward research paper recommendation[J]. IEEE Access, 2020, 8: 188628-188640.
[58] HAO Y D, ANGLUIN D, FRANK R. Formal language recognition by hard attention transformers: perspectives from circuit complexity[J]. Transactions of the Association for Computational Linguistics, 2022, 10: 800–810.
[59] XIAO T, XU Y, YANG K, et al. Arbitrary shape natural scene text detection method based on soft attention mechanism and dilated convolution[J]. IEEE Access, 2020, 8: 122685-122694
[60] MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. ACM, 2014: 2204-2212
[61] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014.
[62] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[63] DONG X, GABRILOVICH E, HEITZ G, et al. Knowledge vault: A web-scale approach to probabilistic knowledge fusion[C]// Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014: 601-610.
[64] LIU Y, ZENG Q G, MERE J O, et al. Anticipating stock market of the renowned companies: a knowledge graph approach[J]. Complexity, 2019, 2019.
[65] JIANG J, WANG T, WANG B, et al. Gated tree-based graph attention network (gtgat) for medical knowledge graph reasoning[J]. Artificial Intelligence in Medicine, 2022, 130: 102329.
[66] RODRÍGUEZ-GARCÍA M Á, GARCÍA-SÁNCHEZ F, VALENCIA-GARCÍA R. Knowledge-based system for crop pests and diseases recognition[J]. Electronics, 2021, 10(8): 905.
[67] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[68] HUANG W T, MAO Y Y, YANG Z, et al. Relation classification via knowledge graph enhanced transformer encoder[J]. Knowledge-Based Systems, 2020, 206: 106312.
[69] ZHU X H, ZHU Y X, ZHANG L F, et al. A BERT-based multi-semantic learning model with aspect-aware enhancement for aspect polarity classification[J]. Applied Intelligence, 2022, 53(4): 4609-4623.
[70] MITCHELL J R, SZEPIETOWSKI P, HOWARD R, et al. A question-and-answer system to extract data from free-text oncological pathology reports (CancerBERT network): Development study[J]. Journal of medical internet research, 2022, 24(3): e27210.
[71] LIU N, HU Q, XU H, et al. Med-BERT: A pretraining framework for medical records named entity recognition[J]. IEEE Transactions on Industrial Informatics, 2021, 18(8): 5600-5608.
[72] REIMERS N, GUREVYCH I. Sentence-bert: Sentence embeddings using siamese bert-networks[J]. arXiv preprint arXiv:1908.10084, 2019.
[73] LIU W B, WANG ZD, LIU X H, et al. A survey of deep neural network architectures and their applications [J]. Neurocomputing. 2017, 234: 11-26.
[74] WANG J X, GAO S B, TANG Z J, et al. A context-aware recommendation system for improving manufacturing process modeling [J]. Journal of Intelligent Manufacturing. 2021, 34(3):1347.1368.
中图分类号:

 G35    

开放日期:

 2023-12-27    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式