We’ve assumed one-to-one correspondence between (immediately-accessible) data and visual encoding(s)
When working with big data, however:
- Full dataset may not fit in the user’s browser cache!
- Even if it does, processing (e.g., placing $N$ points on map) may be prohibitively slow
$⟹$ Some portion of data / some computations need to be handled server side!

Client-Side vs. Server-Side Computing

Reliable estimates of computing power (in FLOPS = Floating-Point Operations Per Second) hard to come by in a world of distributed cloud computing!
Back-of-envelope calculation:
- A given server (AWS, GCP) has 10-100x more computing power than our laptops
- Servers almost entirely devoted to data processing; laptops have to handle OS GUI, streaming video, conserving battery, etc.

Client-Side vs. Server-Side Memory

In Chrome, check JS heap size (in GB) by running:

window.performance.memory.jsHeapSizeLimit / (10**9)

My Chrome JS Heap	4.2947 GB
2020 US Census data	4.3487 GB
Google Maps (2012)	20 000 000.0000 GB

New Opportunities

Allow users to explore time series for arbitrarily-long windows of time!

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

Figure 1: From Shan He, “Creating Beautiful and Meaningful Visualizations with Big Data”

Helpful Even When Data Does Fit In Memory!

Can free user’s CPU for things like lighting computation

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

Figure 2: “Astronomically correct lighting allows users to see how different buildings shade each other during different times of day and year.”

Is This Lighting Thing A Gimmick?

…or a MILLION DOLLAR IDEA!!! 🤑🤑🤑

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

Figure 3: Radiance: Determine your building’s solar potential

Achieving the Best of Both Worlds

The General Idea

Ad hoc approach, figuring out what to do server-side vs. client-side “on the fly” ❌
Instead, we can use systems which integrate them, drawing on respective strengths!
Data Visualization Management System (DVMS)

ZQL = SQL for Visualization

Input: Description of desired visualization
Output: SQL query

`x`	`y`	`constraints`	`viz`
`carrier`	`passengers`	`destination=="New York"`	`bar(y=sum(passengers))`

Produces

SELECT carrier, SUM(passengers)
FROM flight delay
GROUP BY carrier
WHERE destination="New York";

Maybe non-obvious, a priori, how this helps…
Advantages become clear when we start to optimize!

Precomputation

SQL in general needs to handle arbitrary queries…
But for visualization, certain queries will never be made, while others (counting, summing) will be made frequently
Hence we can precompute, on the server side many (most?) of the statistics for layers / levels of aggregation that the users will feasibly want to look at
This frees up processing power on the client side, which can be applied instead towards speed, aesthetics, responsive interactivity, etc.

The Power of Precomputation I

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

The Power of Precomputation II

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

Precomputation: Designing for an Audience

What Makes Big Data Visualization Different?

Memory Issues ⇝ Computational Issues

Client-Side vs. Server-Side Computing

Client-Side vs. Server-Side Memory

New Opportunities

Helpful Even When Data Does Fit In Memory!

Is This Lighting Thing A Gimmick?

Achieving the Best of Both Worlds

The General Idea

ZQL = SQL for Visualization

Precomputation

The Power of Precomputation I

The Power of Precomputation II

Precomputation: Designing for an Audience

Memory Issues $⇝$ Computational Issues