Resource Hub

In-Class Demos

Lab .zips

Access the ZIP files here

PDFs of Books

This page is auto-generated from all of the references across the slides for each week: click on the name of a reference to download the ebook version, if available!

Gopalan, Rukmani. 2022. The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. O’Reilly Media, Inc.
Janssens, Jeroen, and Thijs Nieuwdorp. 2025. Python Polars: The Definitive Guide: Transforming, Analyzing, and Visualizing Data with a Fast and Expressive DataFrame API. O’Reilly Media, Inc.
Kleppmann, Martin. 2017. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media, Inc.
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press.
Loukides, Mike. 2010. What Is Data Science? O’Reilly Media.
Mell, Peter, and Timothy Grance. 2011. The NIST Definition of Cloud Computing.” National Institute of Standards and Technology, Special Publication 800 (2011): 145.
Needham, Mark, Michael Hunger, and Michael Simons. 2024. DuckDB in Action. Simon and Schuster.
Raasveldt, Mark, and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database.” In Proceedings of the 2019 International Conference on Management of Data, 1981–84. SIGMOD ’19. New York, NY, USA: Association for Computing Machinery.
Reis, Joe, and Matt Housley. 2022. Fundamentals of Data Engineering: Plan and Build Robust Data Systems. O’Reilly Media, Inc.
Topol, Matthew, and Wes McKinney. 2024. In-Memory Analytics with Apache Arrow. Packt Publishing Ltd.
White, Tom E. 2015. Hadoop: The Definitive Guide. O’Reilly Media, Inc.
Wolohan, John. 2020. Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code. Simon and Schuster.