Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topicagnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and sycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors’ political affiliation and communication style

Rhythmic and psycholinguistic features for authorship tasks in the Spanish parliament : evaluation and analysis

Corbara, Silvia;
2022

Abstract

Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topicagnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and sycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors’ political affiliation and communication style
2022
Settore INF/01 - Informatica
13th International Conference of the CLEF Association
Bologna
5–8 settembre 2022
Experimental IR Meets Multilinguality, Multimodality, and Interaction
Springer Science and Business Media Deutschland GmbH
978-3-031-13642-9
Authorship Analysis; Text masking; Political speech
   DeepPattern
   Generalitat Valenciana
   PROMETEO/2019/121

   A European Excellence Centre for Media, Society and Democracy
   AI4Media
   European Commission
   Horizon 2020 Framework Programme
   951911
File in questo prodotto:
File Dimensione Formato  
CLEF_2022_cameraready.pdf

Accesso chiuso

Tipologia: Published version
Licenza: Non pubblico
Dimensione 552.49 kB
Formato Adobe PDF
552.49 kB Adobe PDF   Richiedi una copia
front_Index_merged.pdf

Accesso chiuso

Descrizione: pagine preliminari
Tipologia: Published version
Licenza: Non pubblico
Dimensione 781.57 kB
Formato Adobe PDF
781.57 kB Adobe PDF   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/143623
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact