Automated Algorithm Configuration of Sparse Neural Networks

Enabled by the large amount of data that are continuously recorded, deep learning algorithms have become the state-of-the-art models for various tasks (classification, regression, clustering, feature selection, and dimensionality reduction) in a large area of applications. The most advanced deep learning models require so much computation and memory, that they have to be trained on large compute clusters in the cloud before being moved onto more generic hardware for exploitation purposes. This limits their flexibility drastically.

Neural Networks: Never dense, always sparse

There are two typical solutions for this, model compression and model pruning. Both solutions are limited by the initial need to train a very large dense neural network in the cloud. Recently, it has been shown [5] that sparse neural networks trained from scratch (i.e., sparse training) can reach (or even can outperform) dense neural networks while using much fewer computational resources, and consequently, having a small memory footprint while having high representational power. For instance, in [4], it has been shown theoretically that sparse training can reduce with a few orders of magnitude the computational requirements of neural networks. The basic ideas of sparse-to-sparse models are to use sparse connectivity before training in neural networks making the adaptive sparse connectivity concept very suitable for training and inference of neural networks. Despite the effort put into developing these state-of-the-art efficient sparse training models, such as SET [5] and RigL [1], finding an optimal hyperparameter configuration remains a challenge.

Introduction to Automated Algorithm Configuration

Many algorithms, including SET, have hyperparameters that affect the algorithm’s performance. Choosing the right values for these hyperparameters can greatly improve the performance on the problems that are being solved. Traditionally, finding good values for these hyperparameters is done with expert knowledge and trial-and-error experimentation. This is often a tedious and time-consuming endeavor, while computers can do this repetitive task of testing values, assessing performance, and finding potentially better values as well and often even better. Instead of spending time on finding good hyperparameter values (i.e. a configuration), experts can focus on designing new components and ideas to solve problems that are made available to the tools that automatically find good performing algorithms. This paradigm is referred to as programming-by-optimization [2]. Finding a good configuration of (hyper)parameters for an algorithm applied to a specific set of problems is called automated algorithm configuration (AAC) and in the context of machine/deep learning is also referred to as hyperparameter optimization (HPO). The potential solution space of possible configurations is often very large (or infinitely large) and therefore AAC methods cannot simply try all distinct configurations to find the best performing one. Instead, sophisticated optimization techniques, such as Bayesian models [3] and evolutionary strategies, are used to find good configurations.

The project aims to investigate the automatic configuration of sparse training algorithms. If you want to understand what a sparse training algorithm is and how it works; and you are curious about how you can construct a highly configurable meta-framework that combines at least two sparse training models this project is for you. During this project, we aim to investigate 1) the configurability of each algorithm, 2) the advantages of combining those algorithms into a meta-framework, and 3) which variants contribute most to this. In order to answer these questions, an extensive experimental pipeline must be designed that includes gathering suitable data sets for training and testing, preparing the individual and combined algorithms, and setting up the AAC experiments.

This project will be co-supervised by researchers of the Data Management and Biometrics Group of the University of Twente, NL.

We are an enthusiastic team of researchers looking for a student who:

  • is highly motivated with a strong background in deep learning, statistics, algorithmics,
  • will actively participate in the sparse neural networks special interest group meetings at the University of Twente, and
  • besides conducting this work for its final project, has the desire to disseminate the results at a scientific conference or journal together with the supervisors.

References

[1] Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pages 2943–2952. PMLR, 2020.

[2] Holger H. Hoos. Programming by optimization. Commun. ACM, 55(2):70–80, 2012.

[3] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization (LION 5), pages 507–523, 2011.

[4] Decebal C. Mocanu, Elena Mocanu, Tiago Pinto, Selima Curci, Phuong H Nguyen, Madeleine Gibescu, Damien Ernst, and Zita A Vale. Sparse training theory for scalable and efficient agents. AAMAS 2021 (arXiv:2103.01636)

[5] Decebal C. Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, and Antonio Liotta. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature Communication, 9(1):2383, December 2018.