Investigating topic-agnostic features for authorship tasks in Spanish political speeches

Corbara, Silvia; Chulvi, Berta; Rosso, Paolo; Moreo, Alejandro

doi:10.1007/978-3-031-08473-7_36

Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information

Investigating topic-agnostic features for authorship tasks in Spanish political speeches

Corbara, Silvia;Chulvi, Berta;Rosso, Paolo;Moreo, Alejandro

2022

Abstract

Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Settore Scientifico Disciplinare (validi fino a 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Titolo del Convegno
	
				27th International Conference on Applications of Natural Language to Information Systems
			
	Luogo del Convegno
	
				Valencia
			
	Periodo del Convegno
	
				15-17 giugno 2022
			
	Titolo del Volume
	
				Natural Language Processing and Information Systems
			
	Editore
	
				Springer Science and Business Media Deutschland GmbH
			
	ISBN
	
				978-3-031-08472-0
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-08473-7_36
			
	Parole chiave
	
				Authorship Identification; Text distortion; Political Speech
			
	Progetti che finanziano la ricerca
	
	Titolo Progetto
	
									DeepPattern
								
	Nome finanziatore
	
										Generalitat Valenciana
									
	N. Contratto
	
									PROMETEO/2019/121
								
	Titolo Progetto
	
									A European Excellence Centre for Media, Society and Democracy
								
	Acronimo
	
									AI4Media
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon 2020 Framework Programme
								
	N. Contratto
	
									951911
								
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
NLDB_2022_revised.pdf Accesso chiuso Tipologia: Published version Licenza: Tutti i diritti riservati Dimensione 301.78 kB Formato Adobe PDF Richiedi una copia	301.78 kB	Adobe PDF	Richiedi una copia
Index_Front.pdf Accesso chiuso Descrizione: pagine preliminari Tipologia: Published version Licenza: Tutti i diritti riservati Dimensione 775.71 kB Formato Adobe PDF Richiedi una copia	775.71 kB	Adobe PDF	Richiedi una copia