Week 11: Parallel Pipelines and Map-Reduce

DSAN 5500: Data Structures, Objects, and Algorithms in Python

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Monday, April 8, 2024

Open slides in new window →

Beyond Embarrassingly Parallel Code…

Last Week: Embarrassingly Parallel

  • Technical definition: tasks within pipeline can easily be parallelized bc no dependence and no need for communication (see next slide). Better video explanation:

What Happens When Code is Not Embarrassingly Parallel?

  • Think of the difference between linear and quadratic equations in algebra:
  • \(3x - 1 = 0\) is “embarrassingly” solvable, on its own: you can solve it directly, by adding 3 to both sides \(\implies x = \frac{1}{3}\). Same for \(2x + 3 = 0 \implies x = -\frac{3}{2}\)
  • Now consider \(6x^2 + 7x - 3 = 0\): Harder to solve “directly”, so your instinct might be to turn to the laborious quadratic equation:

\[ \begin{align*} x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} = \frac{-7 \pm \sqrt{49 - 4(6)(-3)}}{2(6)} = \frac{-7 \pm 11}{12} = \left\{\frac{1}{3},-\frac{3}{2}\right\} \end{align*} \]

  • And yet, \(6x^2 + 7x - 3 = (3x - 1)(2x + 3)\), meaning that we could have split the problem into two “embarrassingly” solvable pieces, then multiplied to get result!

The Analogy to Map-Reduce

\(\leadsto\) If code is not embarrassingly parallel (instinctually requiring laborious serial execution), \(\underbrace{6x^2 + 7x - 3 = 0}_{\text{Solve using Quadratic Eqn}}\)
But can be split into… \((3x - 1)(2x + 3) = 0\)
Embarrassingly parallel pieces which combine to same result, \(\underbrace{3x - 1 = 0}_{\text{Solve directly}}, \underbrace{2x + 3 = 0}_{\text{Solve directly}}\)
We can use map-reduce to achieve ultra speedup (running “pieces” on GPU!) \(\underbrace{(3x-1)(2x+3) = 0}_{\text{Solutions satisfy this product}}\)

The Direct Analogy: Map-Reduce!

  • Problem you’ve seen in DSAN 5000/5100: Computing SSR (Sum of Squared Residuals)
  • \(y = (1,3,2), \widehat{y} = (2, 5, 0) \implies \text{SSR} = (1-2)^2 + (3-5)^2 + (2-0)^2 = 9\)
  • Computing pieces separately:
map(do_something_with_piece, list_of_pieces)
Code
my_map = map(lambda input: input**2, [(1-2), (3-5), (2-0)])
map_result = list(my_map)
map_result
[1, 4, 4]
  • Combining solved pieces
reduce(how_to_combine_pair_of_pieces, pieces_to_combine)
Code
from functools import reduce
my_reduce = reduce(lambda piece1, piece2: piece1 + piece2, map_result)
my_reduce
9

Functional Programming (FP)

Functions vs. Meta-Functions

  • You may have noticed: map() and reduce() are “meta-functions”: functions that take in other functions as inputs
def add_5(num):
  return num + 5
add_5(10)
15
def apply_twice(fn, arg):
  return fn(fn(arg))
apply_twice(add_5, 10)
20
  • In Python, functions can be used as vars (Hence lambda):
add_5 = lambda num: num + 5
apply_twice(add_5, 10)
20
  • This relates to a whole paradigm, “functional programming”: mostly outside scope of course, but lots of important+useful takeaways/rules-of-thumb! →

Train Your Brain for Functional Approach \(\implies\) Master Debugging!

When a program doesn’t work, each function is an interface point where you can check that the data are correct. You can look at the intermediate inputs and outputs to quickly isolate the function that’s responsible for a bug.
(from Python’s “Functional Programming HowTo”)

Code \(\rightarrow\) Pipelines \(\rightarrow\) Debuggable Pipelines

  • Scenario: Run code, check the output, and… it’s wrong 😵 what do you do?
  • Usual approach: Read lines one-by-one, figuring out what they do, seeing if something pops out that seems wrong; adding comments like # Convert to lowercase
  • Easy case: found typo in punctuation removal code. Fix the error, add comment like # Remove punctuation

    Rule 1 of FP: transform these comments into function names

  • Hard case: Something in load_text() modifies a variable that later on breaks remove_punct() (Called a side-effect)

    Rule 2 of FP: NO SIDE-EFFECTS!

G input in.txt load_text load_text (Verb) input->load_text lowercase lowercase (Verb) load_text->lowercase 🧐 ✅ remove_punct remove_punct (Verb) lowercase->remove_punct 🧐 ✅ remove_stopwords remove_stopwords (Verb) remove_punct->remove_stopwords 🧐 ❌❗️ output out.txt remove_stopwords->output

(Does this way of diagramming a program look familiar?)

  • With side effects: ❌ \(\implies\) issue is somewhere earlier in the chain 😩🏃‍♂️
  • No side effects: ❌ \(\implies\) issue must be in remove_punct()!!! 😎 ⏱️ = 💰

If This Is So Useful… Why Doesn’t Everyone Do It?

  • ~Trapped in imperative (sequential) coding mindset: Path dependency / QWERTY
  • But the reason we need to start thinking like this is: it’s 1000x harder to debug parallel code! So we need to be less ad hoc in how we write+debug, from here on out! 🙇‍♂️🙏

From Leskovec, Rajaraman, and Ullman (2014)

The title relates to a classic Economics joke (the best kind of joke): “An economist and a CEO are walking down the street, when the CEO points at the ground and tells the economist, ‘look! A $20 bill on the ground!’ The economist keeps on walking, scoffing at the CEO: ‘don’t be silly, if there was a $20 bill on the ground, somebody would have picked it up already’.”

But… Why is All This Weird Mapping and Reducing Necessary?

  • Without knowing a bit more of the internals of computing efficiency, it may seem like a huge cost in terms of overly-complicated overhead, not to mention learning curve…

The “Killer Application”: Matrix Multiplication

  • (I learned from Jeff Ullman, who did the obnoxious Stanford thing of mentioning in passing how “two previous students in the class did this for a cool final project on web crawling and, well, it escalated quickly”, aka became Google)

From Leskovec, Rajaraman, and Ullman (2014), which is (legally) free online!

The Killer Way-To-Learn: Text Counts!

  • (2014): Text counts (2.2) \(\rightarrow\) Matrix multiplication (2.3) \(\rightarrow \cdots \rightarrow\) PageRank (5.1)
  • (And yall thought it was just busywork for HW3 😏)
  • The goal: User searches “Denzel Curry”… How relevant is a given webpage?
  • Scenario 1: The entire internet fits on CPU \(\implies\) We can just make a big big hash table:

G internet Scan in O(n): Today Denzel Washington ate a big bowl of Yum's curry. Denzel allegedly rubbed his tum and said "yum yum yum" when he tasted today's curry. "Yum! It is me Denzel, curry is my fav!", he exclaimed. According to his friend Steph, curry is indeed Denzel's fav. We are live with Del Curry in Washington for a Denzel curry update. ccounts Overall Counts ('according',1) ('allegedly',1) ('ate',1) ('big',1) ('bowl',1) ('curry',6) ('del',1) ('denzel',5) ('exclaimed',1) ('fav',2) ('friend',1) ('indeed',1) ('live',1) ('rubbed',1) ('said',1) ('steph',1) ('tasted',1) ('today',2) ('tum',1) ('update',1) ('washington',2) ('yum',4) internet->ccounts Hash Table

If Everything Doesn’t Fit on CPU…

From Cornell Virtual Workshop, “Understanding GPU Architecture”

Break the Problem into Chunks for the Green Bois!

G chunked Chunked Document Today Denzel Washington ate a big bowl of Yum's curry. Denzel allegedly rubbed his tum and said "yum yum yum" when he tasted today's curry. "Yum! It is me Denzel, curry is my fav!", he exclaimed. According to his friend Steph, curry is indeed Denzel's fav. We are live with Del Curry in Washington for a Denzel curry update. chcounts Chunked Counts ('today',1) ('denzel',1) ... ('tum',1) ('said',1) ('yum',1) ('yum',1) ('yum',1) ... ('fav',1) ('exclaimed',1) ... ('del',1) ('curry',1) ('washington',1) ('denzel',1) ('curry',1) ('update',1) chunked:p1->chcounts:p1 O(n/4) chunked:p2->chcounts:p2 O(n/4) chunked:p3->chcounts:p3 O(n/4) chunked:p4->chcounts:p4 O(n/4) scounts Hashed Counts ('allegedly',1) ... ('curry',1) ('denzel',2) ... ('yum',1) ('curry',2) ('denzel',1) ... ('yum',4) ('according',1) ('curry',1) ('del',1) ('denzel',1) ... ('curry',2) ('denzel',1) ('update',1) ('washington',1) chcounts:p1->scounts:p1 O(n/4) chcounts:p2->scounts:p2 O(n/4) chcounts:p3->scounts:p3 O(n/4) chcounts:p4->scounts:p4 O(n/4) ccounts Overall Counts ('according',1) ('allegedly',1) ('ate',1) ('big',1) ('bowl',1) ('curry',6) ('del',1) ('denzel',5) ('exclaimed',1) ('fav',2) ('friend',1) ('indeed',1) ('live',1) ('rubbed',1) ('said',1) ('steph',1) ('tasted',1) ('today',2) ('tum',1) ('update',1) ('washington',2) ('yum',4) scounts:p1->ccounts:p1 scounts:p2->ccounts:p1 scounts:p3->ccounts:p1 scounts:p4->ccounts:p1 scounts:p2->ccounts Merge in O(n)

  • \(\implies\) Total = \(O(3n) = O(n)\)
  • But also optimized in terms of constants, because of sequential memory reads

HW4 Possibility 1

  • Map-Reduced Matrix-Vector Multiplication

HW4 Possibility 2 (More Likely?)

  • Scrape quotes in parallel!
  • Advantage: Builds on HW3
  • Disadvantage: Embarrassingly parallel, so… no Map-Reduce coding practice
  • (Probable tiebreaker: If you’re interested in Map-Reduce, do Matrix Multiplication as Final Project!)

References

Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press.