Semi-Supervised Machine Learning for Query By Example on Relational Databases

Formulating database SQL queries is challenging for a growing number of non-database experts (e.g., biologists, journalists, business administrators) that are required to access and explore data. Query By Example (QBE) methods offer an alternative mechanism where users can retrieve information from large databases using data examples that characterize their intent without having to write complex SQL queries.

Traditional QBE methods such as SQLSynthesizer and TALOS address QBE under the setting of a classification problem where machine learning models are trained to classify data objects in the database as positive or negative, whenever they match the data examples or not, respectively. However, both methods require data examples to fully characterize the user intention. That is, they require a fully labeled training dataset. In practical QBE applications, users often provide only a small subset of the positive data class.

This thesis aims to explore efficient positive and unlabeled learning (PUL) techniques for QBE over large databases. Recent PUL techniques fit the unique setting of QBE and are interesting alternatives to existing methods.

If you are interested in this topic, please have a look at additional information here: Not Only SQL.