Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information

Investigating topic-agnostic features for authorship tasks in Spanish political speeches

Corbara, Silvia;
2022

Abstract

Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information
2022
Settore INF/01 - Informatica
27th International Conference on Applications of Natural Language to Information Systems
Valencia
15-17 giugno 2022
Natural Language Processing and Information Systems
978-3-031-08472-0
Authorship Identification; Text distortion; Political Speech
   DeepPattern
   Generalitat Valenciana
   PROMETEO/2019/121

   A European Excellence Centre for Media, Society and Democracy
   AI4Media
   European Commission
   Horizon 2020 Framework Programme
   951911
File in questo prodotto:
File Dimensione Formato  
NLDB_2022_revised.pdf

Accesso chiuso

Tipologia: Published version
Licenza: Non pubblico
Dimensione 301.78 kB
Formato Adobe PDF
301.78 kB Adobe PDF   Richiedi una copia
Index_Front.pdf

Accesso chiuso

Descrizione: pagine preliminari
Tipologia: Published version
Licenza: Non pubblico
Dimensione 775.71 kB
Formato Adobe PDF
775.71 kB Adobe PDF   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/143624
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact