Rethinking Software Testing for Data Science

Talk

Go to NumFOCUS academy page.

Ensuring that data pipelines are reproducible at all times is primordial to trust our results. Drawing inspiration from software engineering, this talk describes a practical framework to make continuous testing feasible by tackling the four challenges that data-intensive software posits: effective test cases, structure, speed and upstream data changes.

Speaker

Eduardo Blancas

Hi, this is Eduardo. I am broadly interested in developing tools that helps us deliver reliable data products. Towards that end, I developed Ploomber, an open-source Python library. I hold an M.S in Data Science from Columbia University, where I conducted research in computational neuroscience. I started my Data Science career in 2015 when I joined the Center for Data Science and Public Policy at The University of Chicago.