Lecture 10: Visualizing Big Data

DSAN 5200-03: Advanced Data Visualization

Abhijit Dasgupta

Jeff Jacobs

Anderson Monken

Marck Vaisman

Tuesday, January 30, 2024

What Makes Big Data Visualization Different?

(…Let’s brainstorm!)

We’ve assumed one-to-one correspondence between (immediately-accessible) data and visual encoding(s)
When working with big data, however:
- Full dataset may not fit in the user’s browser cache!
- Even if it does, processing (e.g., placing \(N\) points on map) may be prohibitively slow
\(\implies\) Some portion of data / some computations need to be handled server side!

Reliable estimates of computing power (in FLOPS = Floating-Point Operations Per Second) hard to come by in a world of distributed cloud computing!
Back-of-envelope calculation:
- A given server (AWS, GCP) has 10-100x more computing power than our laptops
- Servers almost entirely devoted to data processing; laptops have to handle OS GUI, streaming video, conserving battery, etc.

In Chrome, check JS heap size (in GB) by running:

window.performance.memory.jsHeapSizeLimit / (10**9)

Figure 2: “Astronomically correct lighting allows users to see how different buildings shade each other during different times of day and year.”

Ad hoc approach, figuring out what to do server-side vs. client-side “on the fly” ❌
Instead, we can use systems which integrate them, drawing on respective strengths!
Data Visualization Management System (DVMS)

`x`	`y`	`constraints`	`viz`
`carrier`	`passengers`	`destination=="New York"`	`bar(y=sum(passengers))`

Produces

SELECT carrier, SUM(passengers)
FROM flight delay
GROUP BY carrier
WHERE destination="New York";

SQL in general needs to handle arbitrary queries…
But for visualization, certain queries will never be made, while others (counting, summing) will be made frequently
Hence we can precompute, on the server side many (most?) of the statistics for layers / levels of aggregation that the users will feasibly want to look at
This frees up processing power on the client side, which can be applied instead towards speed, aesthetics, responsive interactivity, etc.