Scaling Up Your Data Work With Dask

Tutorial

Go to NumFOCUS academy page.

Dask is a Python library for scaling and parallelizing Python code on a single machine or across a cluster. It provides familiar, high-level interfaces to extend the existing PyData ecosystem (e.g. NumPy, Pandas) to larger-than-memory or distributed environments. This hands-on tutorial will cover the ins and outs of Dask for new users and is intended for working and aspiring data professionals.

Speakers

James Bourbeau

I’m an open source community member, core maintainer of Dask and Zarr, and a Software Engineer at Coiled where I focus on scaling Python with Dask. Full details about my open-source work are available on GitHub. Additionally, I co-organizer the monthly Madpy Python meetup in Madison, WI and hold a Ph.D. from the University of Wisconsin-Madison where I studied experimental astrophysics as a member of the IceCube collaboration.

Hugo Bowne-Anderson

Hugo Bowne-Anderson is Head of Data Science Evangelism and Marketing at Coiled, a company that makes it simple for organizations to scale their data science seamlessly. He has extensive experience as a data scientist, educator, evangelist, content marketer, and data strategy consultant at DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry. He has developed over 30 courses on the DataCamp platform, impacting over 500,000 learners worldwide through his own courses. He also created the weekly data industry podcast DataFramed, which he hosted and produced for 2 years. He is committed to spreading data skills, access to data science tooling, and open source software, both for individuals and the enterprise.