智谱华章-AI发掘科技创新原动力

您的位置:智谱华章官网 > 公司新闻 > 硕博论文 > 自然语言处理中的预训练

自然语言处理中的预训练

发布日期:2020-02-18 17:08浏览次数:
深度学习中,对模型进行预训练可以显著提升模型表现,这一技术被广泛应用。

在自然语言处理中,通常用两种将预训练的语言表示应用到下游任务中的模式,以提高模型的性能:基于特征及基于模型微调。

基于特征的语言表示预训练方法试图学习词语的通用表示,这些方法有些使用神经网络 [48­-50] ,有些使用其它模型 [51-­53] 。另外一些特征学习的预训练模型主要针对某一类自然语言处理问题,这些模型包括[54-­57] 等。

基于模型微调的方法包括一些预训练的语言模型,这些模型必须要经过微调训练才能用用到下游的任务中[58-­60] 。例如,预训练的 OpenAI GPT 模型 [60] 经过针对特定任务的一定的微调训练后,在 GLUE 评测基准上取得了当时的最优成绩 [61] 。BERT(来自 Transformer 的双向编码器表示)[38]的训练任务是给定完整句子,预测句子中被随机遮盖的词语,因此它具有双向性,而 BERT 的表现也证明了双向性对自然语言处理中表示学习的重要性。

传统方法(例如 OpenAIGPT [60] )是单向的语言模型,每个词的表示只能由在其之前的词得到。ELMo[62]使用了双向的长短时记忆网络,但它是由两个方向相反的长短时记忆网络组合而成,而在 BERT 中,得到每个词的表示的过程都会利用完整的上下文信息。

同时,在微调训练后,BERT 在句子级别及词语级别的各项自然语言处理任务中都取得了当时最优的成绩,这证明了预训练的语言表示可以使得人们可以不用为每种任务单独设计模型结构。

 

参考文献:

[38] Devlin J, Chang M W, Lee K, et al. Bert: Pre­training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.

[47] ErhanD,BengioY,CourvilleA,etal. Why does unsupervised pre-­training help deep learning? [J]. Journal of Machine Learning Research, 2010, 11(Feb):625­660.

[48] Collobert R, Weston J. A unified architecture for natural language processing: Deep neu­ral networks with multitask learning[C]//Proceedings of the 25th international conference onMachine learning. ACM, 2008: 160­167.

[49] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. 2013: 3111­-3119.

[50] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP). 2014: 1532­1543.

[51] Brown P F, Pietra V J D, Souza P V D, et al. Class-­based n-­gram models of natural language[J]. Computational Linguistics, 1992, 18(4):467­479.

[52] Ando R K, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data[J]. Journal of Machine Learning Research, 2005, 6(Nov):1817­1853.

[53] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning[C]//Proceedings of the 2006 conference on empirical methods in natural language processing.Association for Computational Linguistics, 2006: 120­128.

[54] Peters M, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018:2227­2237.

[55] Rajpurkar P, Zhang J, Lopyrev K, et al. Squad: 100,000+ questions for machine comprehen­sion of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Lan­guage Processing. 2016: 2383­2392.

[56] Socher R, Perelygin A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 conference on empirical methods in naturallanguage processing. 2013: 1631­1642.

[57] Tjong Kim Sang E F, De Meulder F. Introduction to the conll­-2003 shared task: Language­ independent named entity recognition[C]//Proceedings of the seventh conference on Natural language learning at HLT­NAACL 2003­Volume 4. Association for Computational Linguis­tics, 2003: 142­147.

[58] Dai A M, Le Q V. Semi­supervised sequence learning[C]//Advances in neural information processing systems. 2015: 3079­3087.

[59] Howard J, Ruder S. Universal language model fine-­tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers): volume 1. 2018: 328­339.

[60] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding with unsu­pervised learning[R]. Technical report, OpenAI, 2018.

[61] Wang A, Singh A, Michael J, et al. Glue: A multi­task benchmark and analysis platform for natural language understanding[C]//Proceedings of the 2018 EMNLP Workshop Black­ boxNLP: Analyzing and Interpreting Neural Networks for NLP. 2018: 353­355.

[62] Peters M, Ammar W, Bhagavatula C, et al. Semi-­supervised sequence tagging with bidire-c­tional language models[C]//Proceedings of the 55th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers). 2017: 1756­1765.

 

[关于转载]:本文为“AMiner”官网文章。转载本文请联系原作者获取授权,转载仅限全文转载并保留文章标题及内容,不得删改、添加内容绕开原创保护,且文章开头必须注明:转自“AMiner”官网。谢谢您的合作。