Design and implementation of an extensible data profiling tool

A typical first step in data integration projects lies in data profiling, i.e., the extraction of data characteristics and integrity constraints.  This thesis aims to classify existing data profiling techniques and tools and to develop a framework with a proof-of-concept implementation that allows combining different analysis techniques (e.g., via suitable input and output formats as well as transformations).  In particular, the approach should be able to construct a modular algorithm for the detection of functional dependencies that takes further data characteristics (to be extracted with further profiling procedures) into account (such as meta-data concerning keys, index structures, cardinalities of attributes, presence of null values).