Inference through innovation processes tested in the authorship attribution task

Tani Raffaelli, Giulio; Lalli, Margherita; Tria, Francesca.

doi:10.1038/s42005-024-01714-6

Urn models for innovation capture fundamental empirical laws shared by several real-world processes. The so-called urn model with triggering includes, as particular cases, the urn representation of the two-parameter Poisson-Dirichlet process and the Dirichlet process, seminal in Bayesian non-parametric inference. In this work, we leverage this connection to introduce a general approach for quantifying closeness between symbolic sequences and test it within the framework of the authorship attribution problem. The method demonstrates high accuracy when compared to other related methods in different scenarios, featuring a substantial gain in computational efficiency and theoretical transparency. Beyond the practical convenience, this work demonstrates how the recently established connection between urn models and non-parametric Bayesian inference can pave the way for designing more efficient inference methods. In particular, the hybrid approach that we propose allows us to relax the exchangeability hypothesis, which can be particularly relevant for systems exhibiting complex correlation patterns and non-stationary dynamics.

Inference through innovation processes tested in the authorship attribution task

Tani Raffaelli, Giulio;Lalli, Margherita;Tria, Francesca.

2024

Abstract

Urn models for innovation capture fundamental empirical laws shared by several real-world processes. The so-called urn model with triggering includes, as particular cases, the urn representation of the two-parameter Poisson-Dirichlet process and the Dirichlet process, seminal in Bayesian non-parametric inference. In this work, we leverage this connection to introduce a general approach for quantifying closeness between symbolic sequences and test it within the framework of the authorship attribution problem. The method demonstrates high accuracy when compared to other related methods in different scenarios, featuring a substantial gain in computational efficiency and theoretical transparency. Beyond the practical convenience, this work demonstrates how the recently established connection between urn models and non-parametric Bayesian inference can pave the way for designing more efficient inference methods. In particular, the hybrid approach that we propose allows us to relax the exchangeability hypothesis, which can be particularly relevant for systems exhibiting complex correlation patterns and non-stationary dynamics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Settore Scientifico Disciplinare (validi dal 09/05/2024)
	
				Settore PHYS-03/A - Fisica sperimentale della materia e applicazioni
Settore MATH-03/B - Probabilità e statistica matematica
			
	Titolo Rivista
	
				COMMUNICATIONS PHYSICS
			
	DOI
	
				https://dx.doi.org/10.1038/s42005-024-01714-6
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s42005-024-01714-6.pdf accesso aperto Tipologia: Published version Licenza: Creative Commons Dimensione 766.3 kB Formato Adobe PDF	766.3 kB	Adobe PDF