Melanie Wietkamp

Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets

Außerordentliches Lunchtime Seminar
Monday, 4. September 2017 - 12:00 to 13:00, Leo 1.2, Leonardo-Campus 1

Fernando Chirigati
Department of Computer Science and Engineering
Tandon School of Engineering
New York University

Title: Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets


The increasing ability to collect data from urban environments, coupled with a push towards openness by governments, has resulted in the availability of numerous spatio-temporal data sets covering diverse aspects of a city. Discovering relationships between these data sets can produce new insights by enabling domain experts to not only test but also generate hypotheses. However, discovering these relationships is difficult. First, a relationship between two data sets may occur only at certain locations and/or time periods. Second, the sheer number and size of the data sets, coupled with the diverse spatial and temporal scales at which the data is available, presents computational challenges on all fronts, from indexing and querying to analyzing them. Finally, it is non-trivial to differentiate between meaningful and spurious relationships. In this talk, I will discuss our ongoing research on uncovering interesting patterns and interactions in urban data while addressing the aforementioned challenges. I will first present Data Polygamy, a scalable and efficient topology-based framework that allows users to query for relationships between spatio-temporal data sets. I will then demo a visual interface that we have created for Data Polygamy, which demonstrates how visualization can help in the discovery of relationships that are potentially interesting for the user. Finally, I will give an overview of our current work on further pruning relationship query results that are not statistically significant, and therefore, that are potentially spurious.


Fernando Chirigati is a Research Assistant and Doctoral Candidate at the Department of Computer Science and Engineering at NYU Tandon School of Engineering. His research interests are mainly in the area of scientific data management, including provenance management and analytics, large-scale data analysis, computational reproducibility, and data visualization. He has received several awards, including the SIGMOD 2017 Most Reproducible Paper Award, the Pearl Brownstein Doctoral Research Award, and the Deborah Rosenthal MD Award. He is also the Reproducibility Editor of Elsevier's Information Systems Journal, and one of the chief architects of ReproZip, a tool that facilitates reproducibility of existing computational experiments.