Analyzing Alternative Approaches for the State-of-the-Art Dimensionality Reduction Technique t-SNE
Nowadays, people are more and more often asked to deal with large and hence often (very) high-dimensional data sets (keyword: big data). However, from a practical point of view, these problems are hard to grasp as this requires a low-dimensional representation (and/or imagination) of the high-dimensional data structures.
Over the years, several methods have been proposed, which are able to reduce a problem's dimensionality, such as multidimensional scaling (MDS), principal component analysis (PCA), or t-distributed stochastic neighborhood embedding (t-SNE). In particular the latter (van der Maaten & Hinton, 2008) has recently drawn lots of interest in a variety of modern application areas, especially from fields that are predominantly solved by means of deep learning, such as image or text recognition.
The main goal of this thesis is a detailed and critical investigation of t-SNE and its building blocks. Within its core, t-SNE performs a gradient-based optimization of the Kullback-Leibler divergence. And while gradient-based optimizers are usually very fast, they are accompanied by the drawback of being easily trapped in local optima. Thus, more versatile approaches, such as evolutionary optimization algorithms, could lead to better results. In addition, the impact of alternative distributions (instead of multivariate normal and Student's t distribution), entropy measures (instead of Kullback-Leibler divergence) and initialization strategies (instead of sampling based on a multivariate normal distributon) shall be investigated within this thesis.
While the plausibility of the proposed alternative approaches shall be discussed from a theoretical point of view, their actual effects have to be analysed empirically, as well as visually across a variety of data sets (w.r.t. problem type and heterogeneity).
Reference: