Towards Real-World Data Streams for Deep Continual Learning

Cossu, Andrea

doi:10.25429/cossu-andrea_phd2023-07-17

Continual Learning deals with Artificial Intelligent agents striving to learn from an ever-ending stream of data. Recently, Deep Continual Learning focused on the design of new strategies to endow Artificial Neural Networks with the ability to learn continuously without forgetting previous knowledge. In fact, the learning process of any Artificial Neural Network model is well-known to lack the sufficient stability to preserve existing knowledge when learning new information. This phenomenon, called catastrophic forgetting or simply forgetting, is considered one of the main obstacles for the design of effective Continual Learning agents. However, existing strategies designed to mitigate forgetting have been evaluated on a restricted set of Continual Learning scenarios. The most used one is, by far, the Class-Incremental scenario applied on object detection tasks. Even though it drove interest in Continual Learning, Class-Incremental scenarios strongly constraint the properties of the data stream, thus limiting its ability to model real-world environments. The core of this thesis concerns the introduction of three Continual Learning data streams, whose design is centered around specific real-world environments properties. First, we propose the Class- Incremental with Repetition scenario, which builds a data stream including both the introduction of new concepts and the repetition of previous ones. Repetition is naturally present in many environments and it constitutes an important source of information. Second, we formalize the Continual Pre-Training scenario, which leverages a data stream of unstructured knowledge to keep a pre-trained model updated over time. One important objective of this scenario is to study how to continuously build general, robust representations that does not strongly depend on the specific task to be solved. This is a fundamental property of real-world agents, which build cross-task knowledge and then adapts it to specific needs. Third, we study Continual Learning scenarios where data streams are composed by temporally-correlated data. Temporal correlation is ubiquitous and lies at the foundation of most environments we, as humans, experience during our life. We leverage Recurrent Neural Networks as our main model, due to their intrinsic ability to model temporal correlations. We discovered that, when applied to recurrent models, Continual Learning strategies behave in an unexpected manner. This highlights the limits of the current experimental validation, mostly focused on Computer Vision tasks. Ultimately, the introduction of new data streams contributed to deepen our understanding of how Artificial Neural Networks learn continuously. We discover that forgetting strongly depends on the properties of the data stream and we observed large changes from one data stream to another. Moreover, when forgetting is mild, we were able to effectively mitigate it with simple strategies, or even without any specific ones. Loosening the focus on forgetting allows us to turn our attention to other interesting problems, outlined in this thesis, like (i) separation between continual representation learning and quick adaptation to novel tasks, (ii) robustness to unbalanced data streams and (iii) ability to continuously learn temporal correlations. These objectives currently defy existing strategies and will likely represent the next challenge for Continual Learning research.

Towards Real-World Data Streams for Deep Continual Learning / Cossu, Andrea; relatore esterno: Bacciu, Davide; Scuola Normale Superiore, ciclo 35, 17-Jul-2023.

Towards Real-World Data Streams for Deep Continual Learning

COSSU, Andrea

2023

Abstract

Continual Learning deals with Artificial Intelligent agents striving to learn from an ever-ending stream of data. Recently, Deep Continual Learning focused on the design of new strategies to endow Artificial Neural Networks with the ability to learn continuously without forgetting previous knowledge. In fact, the learning process of any Artificial Neural Network model is well-known to lack the sufficient stability to preserve existing knowledge when learning new information. This phenomenon, called catastrophic forgetting or simply forgetting, is considered one of the main obstacles for the design of effective Continual Learning agents. However, existing strategies designed to mitigate forgetting have been evaluated on a restricted set of Continual Learning scenarios. The most used one is, by far, the Class-Incremental scenario applied on object detection tasks. Even though it drove interest in Continual Learning, Class-Incremental scenarios strongly constraint the properties of the data stream, thus limiting its ability to model real-world environments. The core of this thesis concerns the introduction of three Continual Learning data streams, whose design is centered around specific real-world environments properties. First, we propose the Class- Incremental with Repetition scenario, which builds a data stream including both the introduction of new concepts and the repetition of previous ones. Repetition is naturally present in many environments and it constitutes an important source of information. Second, we formalize the Continual Pre-Training scenario, which leverages a data stream of unstructured knowledge to keep a pre-trained model updated over time. One important objective of this scenario is to study how to continuously build general, robust representations that does not strongly depend on the specific task to be solved. This is a fundamental property of real-world agents, which build cross-task knowledge and then adapts it to specific needs. Third, we study Continual Learning scenarios where data streams are composed by temporally-correlated data. Temporal correlation is ubiquitous and lies at the foundation of most environments we, as humans, experience during our life. We leverage Recurrent Neural Networks as our main model, due to their intrinsic ability to model temporal correlations. We discovered that, when applied to recurrent models, Continual Learning strategies behave in an unexpected manner. This highlights the limits of the current experimental validation, mostly focused on Computer Vision tasks. Ultimately, the introduction of new data streams contributed to deepen our understanding of how Artificial Neural Networks learn continuously. We discover that forgetting strongly depends on the properties of the data stream and we observed large changes from one data stream to another. Moreover, when forgetting is mild, we were able to effectively mitigate it with simple strategies, or even without any specific ones. Loosening the focus on forgetting allows us to turn our attention to other interesting problems, outlined in this thesis, like (i) separation between continual representation learning and quick adaptation to novel tasks, (ii) robustness to unbalanced data streams and (iii) ability to continuously learn temporal correlations. These objectives currently defy existing strategies and will likely represent the next challenge for Continual Learning research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				17-lug-2023
			
	Settori scientifico-disciplinari (SSD) (validi fino a 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Corso PhD
	
				Data science
			
	Ciclo
	
				35
			
	DOI
	
				https://dx.doi.org/10.25429/cossu-andrea_phd2023-07-17
			
	Relatore/i esterno/i
	
				Bacciu, Davide
Lomonaco, Vincenzo
Monreale, Anna
			
	Editore
	
				Scuola Normale Superiore
			
	Appare nelle tipologie:
	
				9.1 Tesi PhD