Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood.  A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models.  However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice.    This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory.  This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting.  The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.

An Optimization Perspective on Deep Neural Networks / Balboni, Dario; relatore esterno: BACCIU, Davide; Scuola Normale Superiore, ciclo 35, 30-Jan-2025.

An Optimization Perspective on Deep Neural Networks

BALBONI, Dario
2025

Abstract

  Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood.  A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models.  However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice.    This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory.  This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting.  The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.
30-gen-2025
Settore INF/01 - Informatica
Settore MAT/08 - Analisi Numerica
Matematica e Informatica
35
Deep Neural Networks; Polyak-Lojasiewicz Condition; Optimization Theory; Neural Tangent Kernels; Adaptive Optimization Methods; Mean Field Theory; Efficient Training Algorithms
BACCIU, Davide
Scuola Normale Superiore
File in questo prodotto:
File Dimensione Formato  
Tesi.pdf

accesso aperto

Descrizione: Tesi PhD
Tipologia: Published version
Licenza: Non specificata
Dimensione 1.8 MB
Formato Adobe PDF
1.8 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/157591
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact