Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information
Investigating topic-agnostic features for authorship tasks in Spanish political speeches
Corbara, Silvia;
2022
Abstract
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical informationFile | Dimensione | Formato | |
---|---|---|---|
NLDB_2022_revised.pdf
Accesso chiuso
Tipologia:
Published version
Licenza:
Non pubblico
Dimensione
301.78 kB
Formato
Adobe PDF
|
301.78 kB | Adobe PDF | Richiedi una copia |
Index_Front.pdf
Accesso chiuso
Descrizione: pagine preliminari
Tipologia:
Published version
Licenza:
Non pubblico
Dimensione
775.71 kB
Formato
Adobe PDF
|
775.71 kB | Adobe PDF | Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.