Week 2: Cloud Computing

DSAN 6000: Big Data and Cloud Computing
Fall 2025

Amit Arora and Jeff Jacobs

jj1088@georgetown.edu

Tuesday, September 2, 2025

Jeff’s Favorite “Big Data” Definition

 

Big data is when the size of the data itself becomes part of the problem (Loukides 2010)

  • In other words: Your problem becomes a “big data problem” when you hit one or both of the walls!

You vs. Your Opps

So you find yourself smashed against a wall…

(Some of yall would fold in this scenario)

The “Pre-Cloud” Approach: Make The One Computer Faster and Faster

It Worked For… Many Decades!

Is Moore’s Law Dead?

  • …Kind of?
  • Focus nowadays: specialized hardware / compute architectures hyper-optimized for particular tasks
    • [If you took DSAN 5500] Think of BLAS
  • Graphic Processing Units (GPUs)
  • Field Programmable Gate Arrays (FPGAs)
  • Data Processing Units (DPUs)
  • Google’s Tensor Processing Units (TPUs)

So, even upgrading our hardware didn’t help 😭 now what do we do?

  • We distribute!
  • More CPUs, more memory, more storage!

My Favorite Example Ever

  • Spongebob is cooking Krabby Patties too slowly to satisfy ravenous customers…
  • Should he upgrade his one spatula? How bout just many simple spatulas at once!

How Do We Achieve This? …That’s Exactly What The Cloud is For!

Benefits

  • Provides access to low-cost computing
  • Costs are decreasing every year
  • Elastic
  • PaaS works!
  • Many other benefits…

Q: What is The Cloud?

  • A: Using someone else’s computers

NIST Definition

Based on Mell and Grance (2011) (The “official” NIST definition of “Cloud Computing”)

Service Models

Hotel Analogy

Infrastructure as a Service (IaaS)

  • Virtualized computing resources delivered over the internet
  • Provider manages: Physical hardware, virtualization, networking, storage
  • You manage: Operating systems, applications, runtime, data, middleware

Examples and Use Cases

  • Amazon EC2, Microsoft Azure VMs, Google Compute Engine
  • Perfect for: Development environments, web hosting, backup & recovery
  • Benefits: Rapid scaling, pay-as-you-go, global availability
# Launch a virtual machine in seconds
aws ec2 run-instances --image-id ami-12345 --instance-type t3.xlarge

Platform as a Service (PaaS)

What is PaaS?

  • Complete development and deployment environment in the cloud
  • Provider manages: Infrastructure, operating systems, runtime environments
  • You manage: Applications and data

Examples and Use Cases

  • AWS Elastic Beanstalk, Google App Engine, Microsoft Azure App Service
  • Perfect for: Web applications, API development, microservices
  • Benefits: Faster development, automatic scaling, integrated DevOps
# Deploy your app with minimal configuration
git push heroku main  # App automatically deployed and scaled

Software as a Service (SaaS)

  • Complete applications delivered over the internet
  • Provider manages: Everything (infrastructure, platform, software)
  • You manage: Your data and user access

Examples and Use Cases

  • Gmail, Salesforce, Microsoft 365, Slack, Zoom
  • Perfect for: Business applications, collaboration tools, CRM
  • Benefits: No installation, automatic updates, accessible anywhere

The SaaS Revolution

  • 80% of companies use SaaS applications
  • $195 billion market size (2023)

Cloud Infrastructure and the AI Revolution

  • Massive Computational Requirements
    • Training large language models requires thousands of GPUs
    • Cloud provides elastic access to specialized hardware
  • Data Storage and Processing
    • AI models need petabytes of training data
    • Cloud offers unlimited, globally distributed storage
  • Cost Efficiency
    • Pay-per-use model for expensive AI hardware
    • No upfront investment in GPU clusters

GPU Infrastructure: The AI Powerhouse

From Gaming to AI

  • Originally: Graphics processing for video games
  • Now: Parallel processing powerhouse for AI/ML workloads
  • Architecture: Thousands of cores vs. CPUs 8-64 cores

Cloud GPU Offerings

# AWS GPU instances for different AI workloads
p4d.24xlarge   # 8x NVIDIA A100 (40GB) - $32/hour
p3.16xlarge    # 8x NVIDIA V100 (16GB) - $24/hour
g4dn.xlarge    # 1x NVIDIA T4 (16GB)   - $0.50/hour

The Scale Challenge

  • GPT-4 training: Estimated 25,000 A100 GPUs for 6 months
  • Meta’s AI infrastructure: 350,000+ NVIDIA H100 chips
  • Cost: Single AI training run can cost $100M+

Distributed Computing and Large Clusters

Modern AI Requires Massive Clusters

  • Model parallelism: Split models across multiple GPUs
  • Data parallelism: Process different data batches simultaneously
  • Pipeline parallelism: Different layers on different GPUs

Technologies Enabling Scale

  • Kubernetes: Container orchestration for microservices
  • Apache Spark: Distributed data processing framework
  • Ray: Distributed AI/ML framework for Python
# Distributed training with Ray
import ray
from ray import tune
# Scale across 1000+ machines automatically
tune.run(train_model, resources_per_trial={"gpu": 8})

Data Storage Evolution

The Data Tsunami

  • 2025 projection: 181 zettabytes of data globally
  • AI training datasets: CommonCrawl (800TB), ImageNet (150GB)
  • Real-time requirements: 1-10ms latency for inference (some cases sub-millisecond)

Storage Technologies

  • Object Storage: Amazon S3, Google Cloud Storage (petabyte scale)
  • High-performance: NVMe SSDs, persistent memory
  • Distributed filesystems: HDFS, GlusterFS, Ceph

The Economics of Data

  • Storage costs: Dropped 99.9% since 1980
  • Transfer costs: Still significant for large datasets
  • Edge computing: Bringing compute closer to data sources

Evolution of the Cloud

Yesterday Today Tomorrow
Limited number of tools and vendors Many tools and vendors to work with Integrated tools and vendors
One platform - few devices Multiple platforms - many devices Connected platforms and devices
Data is scarce but manageable Overabundance of data Data is used for important business decisions
IT has major influence and control IT has limited influence and control IT is strategic to the business
People only work when they are at work People work wherever they want People have access to what they need wherever they are

What Does the Cloud Look Like? Microsoft Azure Data Center

Loudon County, VA: “CLoudon”

Further Reading (Rabbitholes!)

Further Reading: SaaS and Cloud Computing

SaaS and Cloud Computing Statistics:

Data Storage and AI Infrastructure

Storage Economics and Technology

GPU Infrastructure and AI Training Costs

Lab: EC2 🤝 S3

Click to Open Lab 2

References

Loukides, Mike. 2010. “What Is Data Science?” O’Reilly Media. https://www.oreilly.com/radar/what-is-data-science/.
Mell, Peter, and Timothy Grance. 2011. “The NIST Definition of Cloud Computing.” National Institute of Standards and Technology, Special Publication 800 (2011): 145. https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-145.pdf.