Investigating Common Optimization Strategies of Deep Learning

Although the impact of machine learning in general and deep learning in particular has continuously grown in recent years, people are still confronted with the problem of knowing too little about their actual mechanics. Especially deep learning networks often are treated as black-box models due to their high complexity. As a result, people tend to simply take previously trained networks without questioning any of its building blocks and/or mechanics.

The aim of this thesis is an investigation of one of the most unquestioned ingredients of a deep learning network: training its weights by gradient-based optimization algorithms such as backpropagation, adam, adagrad, etc. In fact, in many optimization domains gradient-based optimizers are frequently outperformed by evolutionary algorithms (EAs), or only perform well on problems with a rather simple topology. A further alternative for finding the optimal weights for (and hence tuning) a given network could be automated algorithm configurators such as irace or SMAC.

Therefore, the goals of this thesis are two-fold

  1. analyze the complexity of deep learning networks (or more precisely the landscape spanned by its tunable weight parameters) and compare them to fitness landscapes of common single-objective continuous benchmark problems (such as BBOB), and
  2. empirically compare (a) gradient-based optimizers that are commonly used by deep learning networks with (b) state-of-the-art EAs and (c) automated algorithm configurators by means of an extensive benchmark study.