Week 6: Depth-First and Breadth-First Search

DSAN 5500: Data Structures, Objects, and Algorithms in Python

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Thursday, February 13, 2025

Open slides in new window →

Schedule

Today’s Planned Schedule:

	Start	End	Topic
Lecture	6:30pm	6:40pm	W01 Reminders, HW2 \(\leadsto\) HW3 →
Lecture	6:40pm	7:05pm	DFS in Isolation →
	7:05pm	7:30pm	BFS in Isolation →
	7:30pm	8:00pm	(Those Who Know…) It’s the Same Algorithm 🤯 →
Break!	8:00pm	8:10pm
	8:10pm	9:00pm	The Scourge of Cycles (Graph Traversal) →

W01 Learning Objectives Reminders!

Hammers vs. Structures Built w/Hammers

How I see the course title:
- Data Structures!
- Objects!, and
- Algorithms! (in Python)
How people I’m trying to meet halfway see the title(?): (Data Structures, Objects, and Algorithms) in Python!

Many, Many Education Systems (including DCPS)

“This is the correct way to do _________”
- Regurgitate it: ✅💯✅ (Didn’t get hit with stick 😎)
- Fail to regurgitate it: ❌👎❌ (Got hit with stick 😔)

Studies of Learning + Motivation

Approach 1

Focus attention on (sufficiently interesting) structure
\(\implies\) students’ comfort with tools “arises out of” wanting to build it
\(\implies\) students discover ways to use tools that “click” w/them specifically (serotonin spike: tool \(\leftrightarrow\) thing they built)

Approach 2

Tell students correct and incorrect ways to use tools
\(\implies\) students learn “correct” and “incorrect” as rules
\(\implies\) (a) lower retention, (b) less adaptability to new tools

Applies Equally Well Whether…

You’re in Cuba…

…OR u just want that data sci BAG! 💰

Midterm Metadata

Coding Portion: Modifications of LinkedList (Circular / jump-to-end / doubly-linked); non-scary OOP skeleton 🙈
Multiple Choice Portion: Lots more to cover…
Hash Tables: \(O(1 + \epsilon \log_2(n))\), but think about it as:
\(1 + (\text{Collision rate}) \cdot (\text{Collision handler efficiency})\)
Linked List \(\rightarrow\) Binary Search Tree
Depth-First vs. Breadth-First: Picture of a tree \(\rightarrow\) (a) what is BFS result, (b) what is (in/pre/post)-order DFS result?
Lastly: Cormen, Leiserson, Rivest, Stein (CLRS), pgs. 17-106

Back to BSTs

BST Building Blocks on HW3

For HW3, we provide you with an InventoryItem class
Two instance variables: item_name and price
Equivalence relations:
- __eq__(other), __ne__(other)
Ordering relations:
- __lt__(other), __le__(other), __gt__(other), __ge__(other)
Bonus: __repr__() and __str__()

LLs \(\rightarrow\) BSTs: The Hard Part

When we were working with LinkedLists, we could access all items by just “looping through”, from one element to the next, printing as we go along.
But… for a BST, our structure can now branch as we traverse it… How do we “loop through” a BST?
Two fundamentally different ways to traverse every node
“Opposites” of each other, so that one is often extremely efficient and the other extremely inefficient for a given task
Your job as a data scientist is to think carefully about which one is more efficient for a given goal!

Two Ways to Traverse: IRL Version

Imagine we’re trying to learn about a topic \(\tau\) using Wikipedia, so we find its article \(\tau_0\)
There are two “extremes” in terms of strategies we could follow for learning, given the contents of the article as well as the links it contains to other articles

Depth-First Search (DFS)

Open \(\tau_0\) and start reading; When we encounter a link we always click it, immediately start reading the new article.
If we hit an article with no links (or a dead end/broken link), we finish reading it and click the back button, picking up where we left off in the previous article. When we reach the end of \(\tau_0\), we’re done!

Breadth-First Search (BFS)

Bookmark \(\tau_0\) in a folder called “Level 0 Articles”; open and start reading it
When we encounter a link, we put it in a “Level 1 Articles” folder, but continue reading \(\tau_0\) until we reach the end.
We then open all “Level 1 Articles” in new tabs, placing links we encounter in these articles into a “Level 2 Articles” folder, that we only start reading once all “Level 1 Articles” are read
We continue like this, reading “Level 3 Articles” once we’re done with “Level 2 Articles”, “Level 4 Articles” once we’re done with “Level 3 Articles”, and so on. (Can you see a sense in which this is the “opposite” of DFS?)

…Let’s try them out! I clicked “Random Article” and got Eustache de Saint Pierre

Two Ways to Traverse: Picture Version

Code

from hw2 import LinkedList, InventoryItem, BinarySearchTree
bst = BinarySearchTree()
item1 = InventoryItem('Mango', 50)
bst.add(item1)
item2 = InventoryItem('Pickle', 60)
bst.add(item2)
item3 = InventoryItem('Artichoke', 55)
bst.add(item3)
item5 = InventoryItem('Banana', 123)
bst.add(item5)
item6 = InventoryItem('Aardvark', 11)
bst.add(item6)
HTML(visualize(bst))

Warning: node None_0, port f1 unrecognized
Warning: node None_1, port f1 unrecognized
Warning: node None_2, port f1 unrecognized
Warning: node None_3, port f1 unrecognized
Warning: node None_4, port f1 unrecognized
Warning: node None_5, port f1 unrecognized

Code

from hw2 import IterAlgorithm, NodeProcessor
print("DFS:")
dfs_processor = NodeProcessor(IterAlgorithm.DEPTH_FIRST)
#print(type(dfs_processor.node_container))
dfs_processor.iterate_over(bst)

print("\nBFS:")
bfs_processor = NodeProcessor(IterAlgorithm.BREADTH_FIRST)
#print(type(bfs_processor.node_container))
bfs_processor.iterate_over(bst)

DFS:
InventoryItem[item_name=Mango,price=50]
InventoryItem[item_name=Pickle,price=60]
InventoryItem[item_name=Artichoke,price=55]
InventoryItem[item_name=Banana,price=123]
InventoryItem[item_name=Aardvark,price=11]

BFS:
InventoryItem[item_name=Mango,price=50]
InventoryItem[item_name=Artichoke,price=55]
InventoryItem[item_name=Pickle,price=60]
InventoryItem[item_name=Aardvark,price=11]
InventoryItem[item_name=Banana,price=123]

Two Ways to Traverse: In-Words Version

Depth-First Search (DFS): With this approach, we iterate through the BST by always taking the left child as the “next” child until we hit a leaf node (which means, we cannot follow this left-child pointer any longer, since a leaf node does not have a left child or a right child!), and only at that point do we back up and take the right children we skipped.
Breadth-First Search (BFS): This is the “opposite” of DFS in the sense that we traverse the tree level-by-level, never moving to the next level of the tree until we’re sure that we have visited every node on the current level.

Two Ways to Traverse: Animated Version

**Depth-First Search** (from Wikimedia Commons)

**Breadth-First Search** (from Wikimedia Commons)

Two Ways to Traverse: Underlying Data Structures

Now that you have some intuition, you may be thinking that they might require very different code to implement 🤔
This is where mathematically-formal link between the two becomes ultra helpful!
It turns out (and a full-on algorithmic theory course makes you prove) that
1. Depth-First Search can be accomplished by processing nodes in an order determined by adding each to a stack, while
2. Breadth-First Search can be accomplished by processing nodes in an order determined by adding each to a queue!
\(\implies\) Literally identical code, “pulling out” the word stack and replacing it with the word queue within your code (or vice-versa).
Put your Software Engineer Hat on: this calls for an abstraction layer!
- (OOP hint: Notice how we’re not teaching you how Queue and Stack work, and not asking you to implement it 🤨)

Two Ways to Traverse: HW3 Version

You’ll make a class called NodeProcessor, with a single iterate_over(tree) function
This function—without any changes in the code or even any if statements!—will be capable of both DFS and BFS
- That’s a hint: lose points if there’s an if statement! 👀 (bc, the whole point of OOP! Encapsulation, abstraction)
Constructor takes in a ThingContainer (could be a stack or a queue, you won’t know which), with two functions:
- put_new_thing_in(new_thing)
- take_existing_thing_out()

`ThingContainer`: The OOP Structure You See

Secret Full OOP Structure You Don’t See

One Animal in the BFS Species

For this and next slide, constant-time preliminary step: “Add root to container” (a ThingContainer object)

Procedure	Algorithm
Breadth-First Search	`while` [`container` not empty]: 1. Take thing out 2. Print thing content 3. Add left child to container 4. Add right child to container

Three Animals in the DFS Species

DFS Procedure	Algorithm
Pre-Order Traversal	1. Print node 2. DFS left subtree 3. DFS right subtree
In-Order Traversal 🧐‼️	1. DFS left subtree 2. Print node 3. DFS right subtree
Post-Order Traversal	1. DFS left subtree 2. DFS right subtree 3. Print node

The Three Animals Traverse our Inventory Tree

Code

visualize(bst)

Warning: node None_0, port f1 unrecognized
Warning: node None_1, port f1 unrecognized
Warning: node None_2, port f1 unrecognized
Warning: node None_3, port f1 unrecognized
Warning: node None_4, port f1 unrecognized
Warning: node None_5, port f1 unrecognized

Final Notes for Homework 3

More Memory, More Performance

The last part challenges you to ask: why stop at a hash based on just the first letter of the key?
We could just as easily use the first two letters:
h('AA') = 0, h('AB') = 1, …, h('AZ') = 25,
h('BA') = 26, h('BB') = 27, …, h('BZ') = 51,
h('CA') = 52, …, h('ZZ') = 675.
You will see how this gets us even closer to the elusive \(O(1)\)! And we could get even closer with three letters, four letters, … 🤔🤔🤔

`AlphaHasher` vs. `CustomHasher`

# @title define-alpha-hasher
from abc import ABC, abstractmethod
import string

class CustomHasher(ABC):
  @abstractmethod
  def __init__(self):
    pass

  @abstractmethod
  def get_alphabet_size(self):
    pass

  @abstractmethod
  def hash(self, str_to_hash: str) -> int:
    pass

  @abstractmethod
  def compute_position_in_alphabet(self, uppercase_key: str) -> int:
    pass

  @abstractmethod
  def compute_key_for_index(self, index: int) -> str:
    pass

class AlphaHasher(CustomHasher):
  def __init__(self):
    self.alphabet_size = 26

  def get_alphabet_size(self):
    return self.alphabet_size

  def hash(self, str_to_hash: str) -> int:
    if len(str_to_hash) == 0:
      first_letter = 'A'
    else:
      first_letter = str_to_hash.upper()[0]
    # And return its index in the alphabet:
    # 'A' has index 0, 'B' has index 1, etc.
    return self.compute_position_in_alphabet(first_letter)

  def compute_position_in_alphabet(self, uppercase_key: str) -> int:
    return string.ascii_uppercase.index(uppercase_key)

  def compute_key_for_index(self, index: int) -> str:
    return string.ascii_uppercase[index]

`LinkedList` via `PolymorphicNode`

EmptyNode \(\rightarrow\) ContentNode

# @title define-linked-list
from abc import ABC, abstractmethod

class LinkedList:
  def __init__(self):
    self.root = EmptyNode()

  def append(self, item):
    self.root = self.root.append(item)

  def find_item_steps(self, item):
    return self.root.find_item_steps(item)

  def to_string(self, recurse: bool):
    return f'LinkedList[{self.root.to_string(recurse)}]'

  def __repr__(self):
    return self.to_string(recurse=True)

  def __str__(self):
    return self.to_string(recurse=False)

class PolymorphicNode(ABC):
  @abstractmethod
  def __init__(self):
    pass

  @abstractmethod
  def append(self, item):
    pass

  @abstractmethod
  def find_item_steps(self, item):
    pass

  @abstractmethod
  def to_string(self, recurse: bool):
    pass

  def __repr__(self):
    return self.to_string(recurse=True)

  def __str__(self):
    return self.to_string(recurse=False)

class EmptyNode(PolymorphicNode):
  def __init__(self):
    super().__init__()

  def append(self, item):
    """
    This is the only weird part of EmptyNode: because we want to utilize
    *polymorphism*, when append() is called on an EmptyNode it
    is "transformed into" a FilledNode! That is why, in the
    LinkedList's append() function, we have self.root = self.root.append(),
    and why the FilledNode's append() function works the same way
    """
    #print("EmptyLinkedListNode.append()")
    new_form = ContentNode(item)
    return new_form

  def find_item_steps(self, item):
    return np.inf

  def __len__(self):
    return 0

  def to_string(self, recurse: bool):
    return ''

class ContentNode(PolymorphicNode):
  def __init__(self, content_arg):
    super().__init__()
    self.content = content_arg
    self.next = EmptyNode()

  def append(self, item):
    self.next = self.next.append(item)
    # Return just *self*, since we *haven't* transformed the type of
    # FilledLinkedListNode by appending another element to it
    return self

  def find_item_steps(self, item):
    if self.content == item or self.content[0] == item:
      return 1
    return 1 + self.next.find_item_steps(item)

  def get_content(self):
    return self.content

  def __len__(self):
    return 1 + len(self.next)

  def to_string(self, recurse: bool):
    content_str = f'ContentNode[{str(self.get_content())}] '
    if not recurse:
      return content_str
    next_str = str(self.next)
    return f'{content_str}{self.next.to_string(recurse)}'

Tuple vs. `InventoryItem` (Part 3.1)

# @title define-inventory-item
class InventoryItem:
  def __init__(self, item_name_arg, price_arg):
    self.item_name = item_name_arg
    self.price = price_arg

  def __lt__(self, other): # -> [NotImplemented | bool]:
    if isinstance(other, InventoryItem):
      return self.item_name < other.item_name
    if isinstance(other, str):
      return self.item_name < other
    return NotImplemented

  def __le__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name <= other.item_name
    if isinstance(other, str):
      return self.item_name <= other
    return NotImplemented

  def __gt__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name > other.item_name
    if isinstance(other, str):
      return self.item_name > other
    return NotImplemented

  def __ge__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name >= other.item_name
    if isinstance(other, str):
      return self.item_name >= other
    return NotImplemented

  def __eq__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name == other.item_name
    if isinstance(other, str):
      return self.item_name == other
    return NotImplemented

  def __ne__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name != other.item_name
    if isinstance(other, str):
      return self.item_name != other
    return NotImplemented

  def __repr__(self) -> str:
    return self.__str__()

  def __str__(self) -> str:
    return f'InventoryItem[item_name={self.item_name},price={self.price}]'

`ThingContainer` (Part 3.2)

# @title define-thing-container
class ThingContainer:
  def __init__(self):
    self.internal_list = []

  @abstractmethod
  def put_new_thing_in(self, item):
    pass

  def is_empty(self) -> bool:
    return self.__len__() == 0

  def __len__(self):
    return len(self.internal_list)

  @abstractmethod
  def take_existing_thing_out(self):
    pass

class Stack(ThingContainer):
  def __init__(self):
    super().__init__()

  def __push(self, item):
    self.internal_list.append(item)

  def __pop(self):
    return self.internal_list.pop()

  def put_new_thing_in(self, item):
    return self.__push(item)

  def take_existing_thing_out(self):
    return self.__pop()

class Queue(ThingContainer):
  def __init__(self):
    super().__init__()

  def put_new_thing_in(self, item):
    return self.__enqueue(item)

  def __enqueue(self, item):
    self.internal_list.insert(0, item)

  def __dequeue(self):
    return self.internal_list.pop()

  def take_existing_thing_out(self):
    return self.__dequeue()

`AlphaHasher2` (Part 5)

# @title define-alpha-hasher-2
class AlphaHasher2(CustomHasher):
  def __init__(self):
    self.alphabet_size = 26 * 26

  def get_alphabet_size(self):
    return self.alphabet_size

  def hash(self, str_to_hash: str) -> int:
    if len(str_to_hash) == 0:
      first_two_letters = 'AA'
    elif len(str_to_hash) == 1:
      first_letter = str_to_hash.upper()[0]
      second_letter = 'A'
    else:
      first_letter = str_to_hash.upper()[0]
      second_letter = str_to_hash.upper()[1]
    #print(f'First two letters for {str_to_hash}: {first_letter}{second_letter}')
    # And return its index:
    # 'AA' has index 0, 'AB' has index 1, etc.
    first_letter_pos = AlphaHasher2.compute_position_in_alphabet(first_letter)
    second_letter_pos = AlphaHasher2.compute_position_in_alphabet(second_letter)
    # The position in the two-letter alphabet is just 26*first + second
    final_pos = 26 * first_letter_pos + second_letter_pos
    return final_pos

  def compute_position_in_alphabet(self, uppercase_key: str) -> int:
    if uppercase_key in string.ascii_uppercase:
      return string.ascii_uppercase.index(uppercase_key)
    return 0

  def compute_key_for_index(self, index: int) -> str:
    first_letter_part = int(index / 26)
    second_letter_part = index % 26
    # In case you need to debug!
    #print(f'alpha2_index: {index}, first_letter_part: {first_letter_part}, second_letter_part: {second_letter_part}')
    return string.ascii_uppercase[first_letter_part] + string.ascii_uppercase[second_letter_part]

Schedule

W01 Learning Objectives Reminders!

Hammers vs. Structures Built w/Hammers

Many, Many Education Systems (including DCPS)

Studies of Learning + Motivation

Applies Equally Well Whether…

Midterm Metadata

Back to BSTs

BST Building Blocks on HW3

LLs \(\rightarrow\) BSTs: The Hard Part

Two Ways to Traverse: IRL Version

Two Ways to Traverse: Picture Version

Two Ways to Traverse: In-Words Version

Two Ways to Traverse: Animated Version

Two Ways to Traverse: Underlying Data Structures

Two Ways to Traverse: HW3 Version

ThingContainer: The OOP Structure You See

Secret Full OOP Structure You Don’t See

One Animal in the BFS Species

Three Animals in the DFS Species

The Three Animals Traverse our Inventory Tree

Final Notes for Homework 3

More Memory, More Performance

AlphaHasher vs. CustomHasher

LinkedList via PolymorphicNode

Tuple vs. InventoryItem (Part 3.1)

ThingContainer (Part 3.2)

AlphaHasher2 (Part 5)

`ThingContainer`: The OOP Structure You See

`AlphaHasher` vs. `CustomHasher`

`LinkedList` via `PolymorphicNode`

Tuple vs. `InventoryItem` (Part 3.1)

`ThingContainer` (Part 3.2)

`AlphaHasher2` (Part 5)