- 无标题文档
查看论文信息

中文题名:

 基于指针网络的中文命名实体识别研究    

姓名:

 李繁    

学号:

 20071212553    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 0252    

学科名称:

 经济学 - 应用统计*    

学生类型:

 硕士    

学位:

 应用统计硕士    

学校:

 西安电子科技大学    

院系:

 数学与统计学院    

专业:

 应用统计    

研究方向:

 应用统计    

第一导师姓名:

 李本崇    

第一导师单位:

  西安电子科技大学    

第二导师姓名:

 蔡云龙    

完成日期:

 2023-06-20    

答辩日期:

 2023-05-30    

外文题名:

 Research of Chinese Named Entity Recognition Based on Pointer Network    

中文关键词:

 命名实体识别 ; Transformer模型 ; 指针网络 ; 对抗训练 ; 学习率调整    

外文关键词:

 Named Entity Recognition ; Transformer Model ; Pointer Network ; Adversarial Training ; Learning Rate Adjustment    

中文摘要:

       如今,人工智能已然融入生活的方方面面,成为生活的一部分。自然语言处理作为其中的热门领域,其发展势不可挡。命名实体识别任务是自然语言处理的重中之重。本文主要针对命名实体识别任务进行研究,致力于运用指针网络思想解决嵌套实体识别问题,并尝试加入对抗训练和学习率调整策略以提高模型识别的泛化能力和鲁棒性,为命名实体识别的进一步发展助力。论文的主要工作如下:

       (1)通过分析命名实体识别任务的特点,针对传统Transformer编码器只包含绝对位置信息的局限性,提出将文本的相对位置信息融入到Transformer编码器的自注意力层,并将其记为Reltransformer编码器,使用该编码器可以增加模型对文本的理解能力,提高识别的F1值。实验结果表明,在使用相同解码器的条件下,相比于传统的Transformer编码器,使用Reltransformer编码后模型识别的F1值提高了约1.2%。

       (2)通过分析嵌套实体识别任务的特点,提出使用“globalspan”全局片段的解码方式。全局片段的解码方式是基于指针网络中“globalpointer”全局指针思想的打分模式,并在打分函数的计算中加入片段特征。该解码方式不仅可以有效解决嵌套实体识别问题,还可以通过并行运算降低模型计算复杂度,提高参数利用率。实验结果表明,在使用相同编码器的条件下,使用全局片段解码相比于使用全局指针解码,模型识别的F1值提高了约0.9%;相比于使用片段排列解码,模型识别的F1值提高了约1.2%且模型的训练时长大约缩短为原时长的5/8。

       (3)基于上述两点,本文提出先使用Reltransformer编码再使用“globalspan”全局片段解码的组合模型,再将对抗训练加入组合模型并采用学习率调整策略对模型进行优化以提高模型的泛化能力和鲁棒性。首先,将对抗训练中的Fast Gradient Method加入模型,并探究0.1、0.3和0.5三种不同的对抗因子对组合模型性能的提升效果,根据不同的数据集任务特点选用不同的对抗因子。其次,采用warmup学习率调整策略对模型的学习率趋势进行调整,分别对比线性趋势和非线性趋势两种方法,以便为模型选择合适的学习率下降趋势。实验结果表明优化后的组合模型F1值提高约3%,从而验证了组合模型的可行性和有效性。

外文摘要:

Today, AI has been integrated into all aspects of life and become a part of life. As one of the hot fields, natural language processing is developing unstoppably. Named entity recognition task is the top priority of natural language processing. This paper focuses on the task of named entity recognition and is devoted to solving the problem of nested entity recognition by using the idea of pointer network. Adversarial training and learning rate adjustment strategies were also added to model to improve its generalization ability and robustness. Therefore, the named entity recognition can be further developed. The main work was summarized as follows:

(1) By analyzing the characteristics of the named entity recognition task, this paper proposed to integrate the relative position information of the text into the self-attention layer of the Transformer encoder and record it as the Reltransformer encoder, thus breaking through the limitation that the traditional Transformer encoder only contains absolute position information. The use of Reltransformer encoder can improve the model’s ability to understand the text and improve the F1 value recognized. The experimental results show that under the premise of using the same decoder, the F1 value of the model recognition using the Reltransformer encoder is about 1.2% higher than that of the traditional Transformer encoder.

(2)By analyzing the characteristics of the nested entity recognition task, the “globalspan” decoding method is proposed. The decoding mode of globalspan is based on the scoring mode of “globalpointer” idea in the pointer network, and the representation method of span features is added to the calculation of the scoring function. This decoding mode can not only solve the problem of nested entity recognition effectively, but also reduce the computational complexity of the model and improve the parameter utilization rate through parallel operation. The experimental results show that, under the premise of using the same encoder, the F1 value of model recognition is improved by about 0.9% compared with the “globalpointer” decoding, about 1.2% compared with the span permutation decoding, and the model training time is shortened by about five-eighths.

(3)On this basis, this paper proposed a combination model that first uses Reltransformer encoding and then “globalspan” decoding. Adversarial training is added to the combination model, and learning rate adjustment strategy is adopted to optimize the model, so as to improve its generalization ability and robustness. First, the adversarial training Fast Gradient Method was added to the model, and the effects of three different adversarial factors 0.1, 0.3 and 0.5 on the performance of the combination model were explored. According to the task characteristics of different data sets, different adversarial factors were selected. Secondly, the warmup learning rate adjustment strategy was used to adjust the learning rate trend of the model, and the linear trend and nonlinear trend methods are compared, so as to select the appropriate learning rate decline trend for the model. Finally, the experimental results show that the F1 value of the optimized combination model increases by about 3%, which indicates the feasibility and effectiveness of the combination model.

参考文献:
[1] 张晓, 李业刚, 王栋, 等. 基于迁移学习的社交评论命名实体识别[J]. 计算机应用与软件, 2022, 39(01) : 143-150.
[2] Li J, Sun A, Han J, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(01) : 50-70.
[3] 赵理金. 一种基于BiLSTM-CRF的命名实体识别方法研究[J]. 电脑与信息技术, 2021, 29(02) : 8-12.
[4] Rau L F. Extracting company names from text[C]// Proceedings the Seventh IEEE Conference on Artificial Intelligence Application. IEEE Computer Society, 1991 : 29-32.
[5] 陈曙东, 欧阳小叶. 命名实体识别技术综述[J]. 无线电通信技术, 2020, 46(03) : 251-260.
[6] 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(03) : 329-340.
[7] Morwal S, Jahan N, Chopra D. Named entity recognition using hidden Markov model (HMM)[J]. International Journal on Natural Language Computing (IJNLC), 2012, 1(04) : 15-23.
[8] Alokaili A, Menai M E B. SVM ensembles for named entity disambiguation[J]. Computing, 2019 : 1-26.
[9] Lafferty J, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the Eighteenth International Conference on Machine Learning, 2001.
[10] 郑洪浩, 宋旭晖, 于洪涛, 等. 基于深度学习的中文命名实体识别综述[J]. 信息工程大学学报, 2021, 22(05) : 590-596.
[11] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is all you need[J]. arXiv preprint arXiv: 1706.03762, 2017.
[12] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv, 2018, 18(10) : 1-16.
[13] 余诗媛, 郭淑明, 黄瑞阳, 等. 嵌套命名实体识别研究进展[J]. 计算机科学, 2021, 48(02) : 1-10.
[14] 焦凯楠, 李欣, 朱容辰. 中文领域命名实体识别综述[J]. 计算机工程与应用, 2021, 57(16) : 1-15.
[15] Lafferty J, McCallum A, Pereira F, et al. Probabilistic models for segmenting and labeling sequence data[J]. Proc.international Conf.on Machine Learning, 2002, 53(02) : 282-289.
[16] Collobert R, Weston J, Bottou L, et al. Natural language processing from scratch[J]. Journal of Machine Learning Research, 2011, 12(08) : 2493-2537.
[17] Hopfield J. Neural networks and physical systems with emergent collective computational abilities [J]. Proceedings of the National Academy of Sciences of the United States of America, 1982, 79(08) : 2554-2558.
[18] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(08) : 1735-1780.
[19] Fukushima, K.Neocognitron. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernetics, 1980, 36(04), 193-202.
[20] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.
[21] Kuru O, Can O A, Yuret D. Charner: Character-level named entity recognition[C]// Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016 : 911-921.
[22] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[J]. arXiv preprint arXiv: 1603.01360, 2016.
[23] Ma X, Hovy E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[J]. arXiv preprint arXiv: 1603.01354, 2016.
[24] Alec Radford, Karthik Narasimhan, Tim Salimans, et al. Improving language understanding by generative pre-training[J]. Open AI, 2018.
[25] Zhilin Yang, Zihang Dai, Yiming Yang, et al. XLNet: Generalized autoregressive pretraining for language understanding[J]. arXiv preprint arXiv: 1906.08237, 2020.
[26] Zhengzhong Lan, MingDa Chen, Sebastian Goodman, et al. ALBERT: A lite BERT for self-supervised learning of language representations[J]. arXiv preprint arXiv: 1909.11942, 2020.
[27] 孙茂松, 黄昌宁, 高海燕, 等. 中文姓名的自动辨识[J]. 中文信息学报, 1995, 9(02) : 16-27.
[28] 向晓雯, 史晓东, 曾华琳. 一个统计与规则相结合的中文命名实体识别系统[J]. 计算机应用, 2005, 25(10) : 2404-2406.
[29] 俞鸿魁, 张华平, 刘群, 等. 基于层叠隐马尔可夫模型的中文命名实体识别[J]. 通信学报, 2006, 27(02) : 87-94.
[30] 胡文博, 都云程, 吕学强, 等. 基于多层条件随机场的中文命名实体识别[J]. 计算机工程与应用, 2009, 45(01) : 163-165.
[31] 李丽双, 黄德根, 陈春荣, 等. SVM与规则相结合的中文地名自动识别[J]. 中文信息学报, 2006, 20(05) : 51-57.
[32] 冯丽萍, 焦莉娟.基于最大熵的中文组织机构名识别模型[J]. 计算机与数字工程, 2010, 38(12) : 36-40.
[33] YonghuiW, MinJiang, JianboLei, et al. Named entity recognitionin chinese clinical text using deep neural network[J]. Studiesin Health Technology and Informatics, 2015, 216 : 624-628.
[34] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv: 1508.01991, 2015.
[35] 王子牛, 姜猛, 高建瓴, 等. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019, 46(S2) : 138-142.
[36] Sun Y, Wang S, Li Y, et al. ERNIE 2.0: A continual pre-training framework for language understanding[J]. arXiv preprint arXiv: 1907.12412, 2019.
[37] 李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别[J]. 计算机工程与应用, 2020, 56(05) : 153-159.
[38] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2019, 5(30) : 36-41.
[39] 韩晓凯, 岳颀, 褚晶, 等. 基于注意力增强的点阵Transformer的中文命名实体识别方法[J]. 厦门大学学报: 自然科学版, 2022, 61(06) : 10.
[40] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[41] Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the National Academy of Sciences of the United States of America, 1982, 79(08) : 2554-2558.
[42] Jordan M I . Serial order: A parallel distributed processing approach[J]. ICS-Report 8604 Institute for Cognitive Science University of California, 1986.
[43] Elman J L. Finding structure in time[J]. Cognitive Science, 1990, 14(02) : 179-211.
[44] Schuster M , Paliwal K K . Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11) : 2673-2681.
[45] Schmidhuber, J. . Deep learning in neural networks: an overview[J]. Neural Networks, 61, 85-117, 2015.
[46] Mnih A, Hinton G E. Three new graphical models for statistical language modelling[C]// Proceedings of the 24th International Conference on Machine Learning, 2007 : 641-648.
[47] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[48] Bengio Y, Goodfellow I, Courville A. Deep learning[M]. Massachusetts,USA: MIT press, 2017.
[49] LeCun, Y., Boser, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(04) : 541-551.
[50] Hang Yan, Bocao Deng, Xiaonan Li, et al. TENER: Adapting transformer encoder for named entity recognition[J]. arXiv preprint arXiv: 1911.04474, 2019.
[51] Kishaloy Halder, Alan Akbik, Josip Krapac, et al. Task-aware representation of sentences for generic text classification[C]// Proceedings of the 28th International Conference on Computational Linguistics, pages 3202–3213, 2020.
[52] Chen X, Zhang N, Li L, et al. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion[J]. arXiv preprint arXiv: 2205.02357, 2022.
[53] Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention[J]. Advances in Neural Information Processing Systems, 2014, 27.
[54] Bahdanau, D., Cho, et al. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[55] Rosenblatt, Frank. The perceptron-a perceiving and recognizing automaton[R]. Report 85-460-1, Cornell Aeronautical Laboratory, 1957.
[56] 何立群, 占永平. 感知器神经网络模型研究[J]. 九江学院学报: 自然科学版, 2014, 29(04) : 37-39, 40.
[57] Marvin Minsky, Seymour A. Papert. Perceptrons[M]. The MIT Press, 1987.
[58] 朱海潮, 刘铭, 秦兵. 基于指针的深度学习机器阅读理解[J]. 智能计算机与应用, 2017, 7(06) : 157-159, 161.
[59] Xiaoya Li, Jingrong Feng, Qinghong Han, et al. A unified MRC framework for named entity recognition[J]. arXiv preprint arXiv: 1910.11476, 2022.
[60] Kalpit Dixit, Yaser Al-Onaizan. Span-level model for relation extraction[C]// In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5308–5314, 2019.
[61] 苏剑林. (May. 01, 2021). 《GlobalPointer:用统一的方式处理嵌套和非嵌套NER》[Blog post]. Retrieved from https://spaces.ac.cn/archives/8373.
[62] Zihang Dai, Zhilin Yang, Yiming Yang, et al. Transformer-XL: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv: 1901.02860, 2019.
[63] Yi Luan, Dave Wadden, Luheng He, et al. A general framework for information extraction using dynamic span graphs[J]. NAACL-HLT 2019, volume 1, pp : 3036–3046.
[64] Yuchen Wang, Bowen Yu, Yueyang Zhang, et al. TPLinker: Single-stage joint extraction of entities and relations through token pair linking[J]. arXiv preprint arXiv: 2010.13415, 2020.
[65] Ian J Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and harnessing adversarial examples[C]// In Proceedings of the 3rd International Conference on Learning Representations, 2015.
[66] Takeru Miyato, Andrew M.Dai, Ian J Goodfellow. Adversarial training methods for semi-supervised text classification[C]// In Proceedings of the 5th International Conference on Learning Representations, 2017.
[67] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. Deep residual learning for image recognition[J]. arXiv preprint arXiv: 1512.03385, 2015.
[68] Priya Goyal, Piotr Dollar, Ross Girshick, et al. Accurate, large minibatch SGD: Training imagenet in 1 hour[J]. arXiv preprint arXiv: 1706.02677, 2018.
中图分类号:

 O21    

馆藏号:

 56339    

开放日期:

 2023-12-23    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式