During the last few years, the abundance of data has significantly boosted the performance of Machine Learning models, integrating them into several aspects of daily life. However, the rise of powerful Artificial Intelligence tools has introduced ethical and legal complexities. This paper proposes a computational framework to analyze the ethical and legal dimensions of Machine Learning models, focusing specifically on privacy concerns and interpretability. In fact, recently, the research community proposed privacy attacks able to reveal whether a record was part of the black-box training set or inferring variable values by accessing and querying a Machine Learning model. These attacks highlight privacy vulnerabilities and prove that GDPR regulation might be violated by making data or Machine Learning models accessible. At the same time, the complexity of these models, often labelled as “black-boxes”, has made the development of explanation methods indispensable to enhance trust and facilitate their acceptance and adoption in high-stake scenarios. Our study highlights the trade-off between interpretability and privacy protection. By introducing REVEAL, this paper proposes a framework to evaluate the privacy exposure of black-box models and their surrogate-based explainers, whether local or global. Our methodology is adaptable and applicable across diverse black-box models and various privacy attack scenarios. Through an in-depth analysis, we show that the interpretability layer introduced by explanation models might jeopardize the privacy of individuals in the training data of the black-box, particularly with powerful privacy attacks requiring minimal knowledge but causing significant privacy breaches.

Evaluating the privacy exposure of interpretable global and local explainers

Naretto, Francesca;Giannotti, Fosca
2025

Abstract

During the last few years, the abundance of data has significantly boosted the performance of Machine Learning models, integrating them into several aspects of daily life. However, the rise of powerful Artificial Intelligence tools has introduced ethical and legal complexities. This paper proposes a computational framework to analyze the ethical and legal dimensions of Machine Learning models, focusing specifically on privacy concerns and interpretability. In fact, recently, the research community proposed privacy attacks able to reveal whether a record was part of the black-box training set or inferring variable values by accessing and querying a Machine Learning model. These attacks highlight privacy vulnerabilities and prove that GDPR regulation might be violated by making data or Machine Learning models accessible. At the same time, the complexity of these models, often labelled as “black-boxes”, has made the development of explanation methods indispensable to enhance trust and facilitate their acceptance and adoption in high-stake scenarios. Our study highlights the trade-off between interpretability and privacy protection. By introducing REVEAL, this paper proposes a framework to evaluate the privacy exposure of black-box models and their surrogate-based explainers, whether local or global. Our methodology is adaptable and applicable across diverse black-box models and various privacy attack scenarios. Through an in-depth analysis, we show that the interpretability layer introduced by explanation models might jeopardize the privacy of individuals in the training data of the black-box, particularly with powerful privacy attacks requiring minimal knowledge but causing significant privacy breaches.
2025
Settore INF/01 - Informatica
Settore INFO-01/A - Informatica
Explainable AI, DATA PRIVACY, ARTIFICIAL INTELLIGENCE
   ”FAIR - Future Artificial Intelligence Research” - Spoke 1 ”Human-centered AI”
   M4C2
   European Commission
   PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013
   PE00000013

   SoBigData.it - Strengthening the Italian RI for Social Mining and Big Data Analytics
   European Commission
   PNRR

   Fairness and Intersectional Non-Discrimination in Human Recommendation
   FINDHR
   European Commission
   Horizon Europe Framework Programme
   101070212

   It takes two to tango: a synergistic approach to human-machine decision making
   TANGO
   European Commission
   Grant Agreement n. 101120763
File in questo prodotto:
File Dimensione Formato  
tdp.a534a24 (1).pdf

accesso aperto

Tipologia: Published version
Licenza: Creative Commons
Dimensione 785.42 kB
Formato Adobe PDF
785.42 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/160983
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact