Machine Learning for Database Query Analysis and Composition

Formulating efficient SQL queries is often done in a series of time-consuming cycles of execution, analysis, and tuning. Providing insights about queries prior to their execution (e.g., expected run-time, answer size estimation) can help users to optimize their working time, especially in complex data processing/integration tasks and data engineering pipelines.

This thesis aims to push forward the ideas from Zolaktaf et al. [ZMP21] by using machine learning techniques that exploit large query workloads to model SQL queries and their properties. The ultimate goal of this thesis is to develop an SQL query interface for relational DBMS (e.g., MySQL, PostgreSQL) where predictive query insights are shown to the user to facilitate query analysis can formulation.

Clearly, programming knowledge is required to accomplish this goal.

[ZMP21] Zainab Zolaktaf, Mostafa Milani, and Rachel Pottinger. 2020. Facilitating SQL Query Composition and Analysis. In <i>Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, 209–224. DOI: https://doi.org/10.1145/3318464.3380602