Beautiful (ML) Data: Patterns & Best Practice for effective Data solutions with PyTorch

Tutorial

Go to NumFOCUS academy page.

Data loading is essential to every Deep Learning model, and PyTorch has revolutionised the way in which data is managed in training & evaluation: Pythonic Dataset and DataLoader over list of numpy arrays. But there’s more: batching-sampling-partitioning-augmentation. The tutorial explores the internals of torch.utils.data for patterns and best practices for efficient and PyTorchy data processing.

Speaker

Valerio Maggio

Valerio received his PhD in Computational Science by University of Naples “Federico II”, working on the definition of unsupervised machine learning methods to support Program Comprehension and Software Maintenance. Since then, his research and professional interests are focused on combining Software Engineering principles into the Data Science practice. Valerio is currently appointed as Senior Research Associate at the University of Bristol, working on methods and software tools for Reproducible Machine learning in healthcare. Valerio is also a member of the Azure CRSE (Cloud Research Software Engineer) team as part of the Microsoft initiative for Higher Education and Research, and active member of the Python community (PyCon Italy, PyData, EuroScipy).