Using data science to uncover cognitive constraints in human behavior beyond social interactions.

Ollivier, Kilian Frederic

doi:10.25429/ollivier-kilian-frederic_phd2024-01-26

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our ``bandwidth'' for social interactions, humans organize their social relations according to a regular structure. In the thesis, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. The thesis consists of three main parts. In the first part, we leverage a methodology similar to the one used to uncover social cognitive constraints applied to the domain of language. More specifically, we are interested in understanding how individuals unconsciously structure their vocabulary. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). We find that a concentric layered structure (which we call emph{ego network of words}, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use.In the second part we carry out a semantic analysis of the model. Each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that the innermost ring, which contains the most frequently used words, can be seen as the semantic fingerprint of the whole model.In the third part, drawing inspiration from social ego networks where the active part includes relationships regularly nurtured by individuals, we establish the notion of an active ego network of words. We demonstrate that without the active network concept, an ego network becomes vulnerable to the amount of data considered, leading to the disappearance of the layered structure in larger datasets (we used an extended version of the Twitter/X dataset and MediaSum, a preexisting dataset containing a large amount of interview transcripts). To address this, we define a methodology for extracting the active part of the ego network of words and validating it. The resulting ego network structures align substantially with the layer ego network of words obtained in previous chapters where only the active network was implicitly covered, confirming the model's robustness across different dataset sizes. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage including spoken forms of communication.

Using data science to uncover cognitive constraints in human behavior beyond social interactions / Ollivier, Kilian Frederic; relatore esterno: Dell'Orletta, Felice; Scuola Normale Superiore, ciclo 34, 26-Jan-2024.

Using data science to uncover cognitive constraints in human behavior beyond social interactions.

OLLIVIER, Kilian Frederic

2024

Abstract

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our ``bandwidth'' for social interactions, humans organize their social relations according to a regular structure. In the thesis, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. The thesis consists of three main parts. In the first part, we leverage a methodology similar to the one used to uncover social cognitive constraints applied to the domain of language. More specifically, we are interested in understanding how individuals unconsciously structure their vocabulary. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). We find that a concentric layered structure (which we call emph{ego network of words}, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use.In the second part we carry out a semantic analysis of the model. Each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that the innermost ring, which contains the most frequently used words, can be seen as the semantic fingerprint of the whole model.In the third part, drawing inspiration from social ego networks where the active part includes relationships regularly nurtured by individuals, we establish the notion of an active ego network of words. We demonstrate that without the active network concept, an ego network becomes vulnerable to the amount of data considered, leading to the disappearance of the layered structure in larger datasets (we used an extended version of the Twitter/X dataset and MediaSum, a preexisting dataset containing a large amount of interview transcripts). To address this, we define a methodology for extracting the active part of the ego network of words and validating it. The resulting ego network structures align substantially with the layer ego network of words obtained in previous chapters where only the active network was implicitly covered, confirming the model's robustness across different dataset sizes. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage including spoken forms of communication.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				26-gen-2024
			
	Settori scientifico-disciplinari (SSD) (validi fino a 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Corso PhD
	
				Matematica e Informatica
			
	Ciclo
	
				34
			
	DOI
	
				https://dx.doi.org/10.25429/ollivier-kilian-frederic_phd2024-01-26
			
	Parole chiave
	
				Ego Network; NLP; Language Production; Cognitive Constraints
			
	Relatore/i esterno/i
	
				Dell'Orletta, Felice
Boldrini, Chiara
Passarella, Andrea
Conti, Marco
			
	Editore
	
				Scuola Normale Superiore
			
	Appare nelle tipologie:
	
				9.1 Tesi PhD

File in questo prodotto:

File	Dimensione	Formato
Tesi.pdf accesso aperto Descrizione: Tesi PhD Tipologia: Published version Licenza: Non specificata Dimensione 8.9 MB Formato Adobe PDF	8.9 MB	Adobe PDF