Towards a framework for automated quantitative data quality measurement

Data Profiling is the task of automatically examining an unknown data set in order to create meta data about it, like value ranges and distributions of columns, or functional dependencies across columns. The results of Data Profiling are useful for planning further data-related tasks, like data management, data cleansing, etc. Due to the advent of what is commonly called “Big Data” and its dimensions volume, velocity and variety, a number of new challenges and requirements for Data Profiling tools have emerged. Some of the most interesting topics are: Online profiling, profiling on queries and views, incremental profiling, continuous profiling, multi-measure profiling, profiling heterogeneous and unstructured data, profiling of data streams (cf. Naumann “Data Profiling Revisited”, 2013). This topic should give either a broad overview over the biggest issues and the respective research in this field, or focus on a particular problem and implement a prototypical solution.