Lecture 10: Visualizing Big Data

DSAN 5200-03: Advanced Data Visualization

Authors
Affiliations

Abhijit Dasgupta

Jeff Jacobs

Anderson Monken

Marck Vaisman

Published

Tuesday, January 30, 2024

Open slides in new window →

What Makes Big Data Visualization Different?

(…Let’s brainstorm!)

Memory Issues Computational Issues

  • We’ve assumed one-to-one correspondence between (immediately-accessible) data and visual encoding(s)
  • When working with big data, however:
    • Full dataset may not fit in the user’s browser cache!
    • Even if it does, processing (e.g., placing N points on map) may be prohibitively slow
  • Some portion of data / some computations need to be handled server side!

Client-Side vs. Server-Side Computing

  • Reliable estimates of computing power (in FLOPS = Floating-Point Operations Per Second) hard to come by in a world of distributed cloud computing!
  • Back-of-envelope calculation:
    • A given server (AWS, GCP) has 10-100x more computing power than our laptops
    • Servers almost entirely devoted to data processing; laptops have to handle OS GUI, streaming video, conserving battery, etc.

Client-Side vs. Server-Side Memory

  • In Chrome, check JS heap size (in GB) by running:

    window.performance.memory.jsHeapSizeLimit / (10**9)
My Chrome JS Heap 4.2947 GB
2020 US Census data 4.3487 GB
Google Maps (2012) 20 000 000.0000 GB

New Opportunities

  • Allow users to explore time series for arbitrarily-long windows of time!
Video Player is loading.
Current Time 0:00
Duration -:-
Loaded: 0%
Stream Type LIVE
Remaining Time -:-
 
1x
    • Chapters
    • descriptions off, selected
    • captions off, selected
      Figure 1: From Shan He, “Creating Beautiful and Meaningful Visualizations with Big Data”

      Helpful Even When Data Does Fit In Memory!

      • Can free user’s CPU for things like lighting computation
      Video Player is loading.
      Current Time 0:00
      Duration -:-
      Loaded: 0%
      Stream Type LIVE
      Remaining Time -:-
       
      1x
        • Chapters
        • descriptions off, selected
        • captions off, selected
          Figure 2: “Astronomically correct lighting allows users to see how different buildings shade each other during different times of day and year.”

          Is This Lighting Thing A Gimmick?

          • …or a MILLION DOLLAR IDEA!!! 🤑🤑🤑
          Video Player is loading.
          Current Time 0:00
          Duration -:-
          Loaded: 0%
          Stream Type LIVE
          Remaining Time -:-
           
          1x
            • Chapters
            • descriptions off, selected
            • captions off, selected
              Figure 3: Radiance: Determine your building’s solar potential

              Achieving the Best of Both Worlds

              The General Idea

              • Ad hoc approach, figuring out what to do server-side vs. client-side “on the fly” ❌
              • Instead, we can use systems which integrate them, drawing on respective strengths!
              • Data Visualization Management System (DVMS)

              ZQL = SQL for Visualization

              • Input: Description of desired visualization
              • Output: SQL query
              x y constraints viz
              carrier passengers destination=="New York" bar(y=sum(passengers))

              Produces

              SELECT carrier, SUM(passengers)
              FROM flight delay
              GROUP BY carrier
              WHERE destination="New York";
              • Maybe non-obvious, a priori, how this helps…
              • Advantages become clear when we start to optimize!

              Precomputation

              • SQL in general needs to handle arbitrary queries…
              • But for visualization, certain queries will never be made, while others (counting, summing) will be made frequently
              • Hence we can precompute, on the server side many (most?) of the statistics for layers / levels of aggregation that the users will feasibly want to look at
              • This frees up processing power on the client side, which can be applied instead towards speed, aesthetics, responsive interactivity, etc.

              The Power of Precomputation I

              Video Player is loading.
              Current Time 0:00
              Duration -:-
              Loaded: 0%
              Stream Type LIVE
              Remaining Time -:-
               
              1x
                • Chapters
                • descriptions off, selected
                • captions off, selected

                  The Power of Precomputation II

                  Video Player is loading.
                  Current Time 0:00
                  Duration -:-
                  Loaded: 0%
                  Stream Type LIVE
                  Remaining Time -:-
                   
                  1x
                    • Chapters
                    • descriptions off, selected
                    • captions off, selected

                      Precomputation: Designing for an Audience