Linguistic Feature Injection for Efficient Natural Language Processing

Fioravanti, S.; Zugarini, A.; Giannini, F.; Rigutini, L.; Maggini, M.; Diligenti, M.

doi:10.1109/IJCNN54540.2023.10191680

Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream task using a relatively smaller amount of training data. In a similar fashion, this paper considers a plug-and-play framework that can be used to inject multiple syntactic features, like Part-of-Speech Tagging or Dependency Parsing, into any kind of pre-trained Transformer. This novel approach allows to perform sequence-to-sequence labeling tasks by exploiting: (i) the (more abundant) available training data that is also used to learn the syntactic features, (ii) the language data that is used to pre-train the transformer model. The experimental results show that our approach improves over the baseline performances of the underlying model in different datasets, thus proving the effectiveness of employing syntactic language information for semantic regularization. In addition, we show that our architecture has a huge efficiency advantage over pure large language models. Indeed, by using a model with limited size, but whose input data are enriched with syntactic information, we show that it is possible to obtain a significant reduction of CO2 emissions without decreasing the prediction performances.

Linguistic Feature Injection for Efficient Natural Language Processing

Fioravanti S.;Zugarini A.;Giannini F.;Rigutini L.;Maggini M.;Diligenti M.

2023

Abstract

Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream task using a relatively smaller amount of training data. In a similar fashion, this paper considers a plug-and-play framework that can be used to inject multiple syntactic features, like Part-of-Speech Tagging or Dependency Parsing, into any kind of pre-trained Transformer. This novel approach allows to perform sequence-to-sequence labeling tasks by exploiting: (i) the (more abundant) available training data that is also used to learn the syntactic features, (ii) the language data that is used to pre-train the transformer model. The experimental results show that our approach improves over the baseline performances of the underlying model in different datasets, thus proving the effectiveness of employing syntactic language information for semantic regularization. In addition, we show that our architecture has a huge efficiency advantage over pure large language models. Indeed, by using a model with limited size, but whose input data are enriched with syntactic information, we show that it is possible to obtain a significant reduction of CO2 emissions without decreasing the prediction performances.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Settore Scientifico Disciplinare (validi fino a 24/06/2024)
	
				Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Settore Scientifico Disciplinare (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Titolo del Convegno
	
				2023 International Joint Conference on Neural Networks, IJCNN 2023
			
	Luogo del Convegno
	
				aus
			
	Periodo del Convegno
	
				2023
			
	Titolo del Volume
	
				Proceedings of the International Joint Conference on Neural Networks
			
	Editore
	
				Institute of Electrical and Electronics Engineers Inc.
			
	ISBN
	
				9781665488679
			
	DOI
	
				https://dx.doi.org/10.1109/IJCNN54540.2023.10191680
			
	Parole chiave
	
				Energy consumption; Computational modeling; Training data; Computer architecture; Syntactics; Predictive models; Transformers
			
	Progetti che finanziano la ricerca
	
	Titolo Progetto
	
									HumanE AI Network
								
	Acronimo
	
									HumanE-AI-Net
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon 2020 Framework Programme
								
	N. Contratto
	
									952026
								
	Titolo Progetto
	
									Foundations of Trustworthy AI - Integrating Reasoning, Learning and Optimization
								
	Acronimo
	
									TAILOR
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon 2020 Framework Programme
								
	N. Contratto
	
									952215
								
	Titolo Progetto
	
									Learning with Multiple Representations
								
	Acronimo
	
									LEMUR
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon Europe Framework Programme
								
	N. Contratto
	
									101073307
								
	Informazioni sul finanziamento della ricerca
	
				Business Events Australia
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
IJCNN - Linguistic feature injection.pdf Accesso chiuso Tipologia: Published version Licenza: Tutti i diritti riservati Dimensione 865.4 kB Formato Adobe PDF Richiedi una copia	865.4 kB	Adobe PDF	Richiedi una copia