What Makes Big Data Visualization Different?
(…Let’s brainstorm!)
Memory Issues Computational Issues
- We’ve assumed one-to-one correspondence between (immediately-accessible) data and visual encoding(s)
- When working with big data, however:
- Full dataset may not fit in the user’s browser cache!
- Even if it does, processing (e.g., placing
points on map) may be prohibitively slow
Some portion of data / some computations need to be handled server side!
Client-Side vs. Server-Side Computing
- Reliable estimates of computing power (in FLOPS = Floating-Point Operations Per Second) hard to come by in a world of distributed cloud computing!
- Back-of-envelope calculation:
- A given server (AWS, GCP) has 10-100x more computing power than our laptops
- Servers almost entirely devoted to data processing; laptops have to handle OS GUI, streaming video, conserving battery, etc.
Client-Side vs. Server-Side Memory
In Chrome, check JS heap size (in GB) by running:
window.performance.memory.jsHeapSizeLimit / (10**9)
My Chrome JS Heap | 4.2947 GB |
2020 US Census data | 4.3487 GB |
Google Maps (2012) | 20 000 000.0000 GB |
New Opportunities
- Allow users to explore time series for arbitrarily-long windows of time!
Video Player is loading.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Helpful Even When Data Does Fit In Memory!
- Can free user’s CPU for things like lighting computation
Video Player is loading.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Is This Lighting Thing A Gimmick?
- …or a MILLION DOLLAR IDEA!!! 🤑🤑🤑
Video Player is loading.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Achieving the Best of Both Worlds
The General Idea
- Ad hoc approach, figuring out what to do server-side vs. client-side “on the fly” ❌
- Instead, we can use systems which integrate them, drawing on respective strengths!
- Data Visualization Management System (DVMS)
ZQL = SQL for Visualization
- Input: Description of desired visualization
- Output: SQL query
x |
y |
constraints |
viz |
---|---|---|---|
carrier |
passengers |
destination=="New York" |
bar(y=sum(passengers)) |
Produces
SELECT carrier, SUM(passengers)
FROM flight delay
GROUP BY carrier
WHERE destination="New York";
- Maybe non-obvious, a priori, how this helps…
- Advantages become clear when we start to optimize!
Precomputation
- SQL in general needs to handle arbitrary queries…
- But for visualization, certain queries will never be made, while others (counting, summing) will be made frequently
- Hence we can precompute, on the server side many (most?) of the statistics for layers / levels of aggregation that the users will feasibly want to look at
- This frees up processing power on the client side, which can be applied instead towards speed, aesthetics, responsive interactivity, etc.
The Power of Precomputation I
Video Player is loading.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
The Power of Precomputation II
Video Player is loading.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.