We derive a sound positive semi-definite approximation of the Hessian of deep models for which Hessian-vector products are easily computable. This enables us to provide an adaptive SGD learning rate strategy based on the minimization of the local quadratic approximation, which requires just twice the computation of a single SGD run, but performs comparably with grid search on SGD learning rates on different model architectures (CNN with and without residual connections) on classification tasks. We also compare the novel approximation with the Gauss-Newton approximation.

ADLER : An Efficient Hessian-based Strategy for Adaptive Learning Rate

Balboni, Dario;Bacciu, Davide
2024

Abstract

We derive a sound positive semi-definite approximation of the Hessian of deep models for which Hessian-vector products are easily computable. This enables us to provide an adaptive SGD learning rate strategy based on the minimization of the local quadratic approximation, which requires just twice the computation of a single SGD run, but performs comparably with grid search on SGD learning rates on different model architectures (CNN with and without residual connections) on classification tasks. We also compare the novel approximation with the Gauss-Newton approximation.
2024
Settore INFO-01/A - Informatica
32nd European Symposium on Artificial Neural Networks
Bruges, Belgium
October 2024
ESANN 2024 Proceedings
9782875870896
File in questo prodotto:
File Dimensione Formato  
2305.16396v1.pdf

accesso aperto

Tipologia: Accepted version (post-print)
Licenza: Solo Lettura
Dimensione 545.27 kB
Formato Adobe PDF
545.27 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/148811
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact