Analysis of practical applications of Business Intelligence architectures for the Big Data era

The era of Big Data is dominated by tools that do not originate from the traditional SQL-based database and Data Warehouse (DWH) area. DWHs are used for high-value, structured, and integrated data in OLAP scenarios and are well-known means to conduct Business Intelligence (BI). Big Data tools have a wider range of use, as these are not restricted to structured data, but can often handle a larger quantity of more heterogeneous data, e.g. XML files, JSON, or pictures and video. Besides NoSQL systems, Apache Hadoop has gained widespread attention in both academia and practice. It includes processing with the MapReduce paradigm, which allows distributing a task to many simple workers, enabling MapReduce applications to scale horizontally with commodity machines for Big Data. Through imperative programming, it is more flexible than the declarative SQL of traditional systems, but this flexibility also increases complexity. The latest Apache Hadoop 2.0 stack contains a variety of novel tools, e.g., a distributed file system (HDFS) and cluster management tools. This opens up several usage models for using Hadoop with a DWH (e.g., Hadoop as ETL tool, or Hadoop as DWH-complement) to create a Big Data-ready DWH that enriches the BI approach of a company. Such systems consisting of multiple physical data storage entities can also be called Logical Data Warehouse (LDWH) if the data is not stored in one single central DWH anymore. Notably, it could also be possible to replace traditional DWH technology with new Big Data technology and still provide the required BI functionality with a generic Big Data-ready BI architecture.

The goal of this thesis is to analyze which Big Data-ready BI architecture usage models including DWH technology and/or Big Data technologies in general and the Hadoop stack in particular are employed in practice. For this end, relevant use cases from practice describing such architectures are gathered, compared and structured, and employed to create classes of usage models which the use cases are categorized into.  Lastly, the resulting usage model classes should be compared to the existing DWH reference architecture (e.g., cf. [Vos08]) and steps to academically enhance such a reference architecture should be sketched.