Week 7: Code Examples and Midterm Review

DSAN 5500: Data Structures, Objects, and Algorithms in Python

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Thursday, February 19, 2026

Open slides in new window →

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 7:20pm BST Iteration
7:20pm 7:30pm Part դ: Mini-Topic →
7:30pm 7:50pm Part Բ: Full Topic →
Break! 7:50pm 8:00pm
8:00pm 8:40pm Parts Գ-Դ: Two Full Topics →
8:40pm 9:00pm Parts ե-զ: Two Mini-Topics →

Back to BSTs

LLs \(\rightarrow\) BSTs: The Hard Part

  • When we were working with LinkedLists, we could access all items by just “looping through” (follow next, print() at each step1)
  • But… for a BST, our structure can now branch as we traverse it… How do we “loop through” a BST?
  • Two fundamentally different ways to traverse every node
  • “Opposites” of each other, so that one is often extremely efficient and the other extremely inefficient for a given task
  • Your job as a data scientist is to think carefully about which one is more efficient for a given goal!

Two Ways to Traverse: IRL Version

  • You’re trying to learn about a topic \(\tau\) using Wikipedia, so you find its article \(\tau_0\)
  • Two strategic “extremes” given [content of article \(i\)], [links to other articles \(i \rightarrow j\)]
Note Depth-First Search (DFS)
  • Open \(\tau_0\) and start reading; When we encounter a link we always click, immediately start reading new article.
  • If we hit an article with no links (or a dead end/broken link), we finish reading it and click the back button, picking up where we left off in the previous article. When we reach the end of \(\tau_0\), we’re done!
Note Breadth-First Search (BFS)
  • Bookmark \(\tau_0\) in a folder called “Level 0 Articles”; open and start reading it
  • When we encounter a link, put it in “Level 1 Articles” folder but continue reading \(\tau_0\) to the end
  • We then open all “Level 1 Articles” in new tabs, placing links we encounter in these articles into a “Level 2 Articles” folder, that we only start reading once all “Level 1 Articles” are read
  • We continue like this, reading “Level 3 Articles” once we’re done with “Level 2 Articles”, “Level 4 Articles” once we’re done with “Level 3 Articles”, and so on. (Can you see a sense in which this is the “opposite” of DFS?)

Two Ways to Traverse: Picture Version

Code
from hw2 import LinkedList, InventoryItem, BinarySearchTree
bst = BinarySearchTree()
item1 = InventoryItem('Mango', 50)
bst.add(item1)
item2 = InventoryItem('Pickle', 60)
bst.add(item2)
item3 = InventoryItem('Artichoke', 55)
bst.add(item3)
item5 = InventoryItem('Banana', 123)
bst.add(item5)
item6 = InventoryItem('Aardvark', 11)
bst.add(item6)
HTML(visualize(bst))
Warning: node None_0, port f1 unrecognized
Warning: node None_1, port f1 unrecognized
Warning: node None_2, port f1 unrecognized
Warning: node None_3, port f1 unrecognized
Warning: node None_4, port f1 unrecognized
Warning: node None_5, port f1 unrecognized

Code
from hw2 import IterAlgorithm, NodeProcessor
print("DFS:")
dfs_processor = NodeProcessor(IterAlgorithm.DEPTH_FIRST)
#print(type(dfs_processor.node_container))
dfs_processor.iterate_over(bst)

print("\nBFS:")
bfs_processor = NodeProcessor(IterAlgorithm.BREADTH_FIRST)
#print(type(bfs_processor.node_container))
bfs_processor.iterate_over(bst)
DFS:
InventoryItem[item_name=Mango,price=50]
InventoryItem[item_name=Pickle,price=60]
InventoryItem[item_name=Artichoke,price=55]
InventoryItem[item_name=Banana,price=123]
InventoryItem[item_name=Aardvark,price=11]

BFS:
InventoryItem[item_name=Mango,price=50]
InventoryItem[item_name=Artichoke,price=55]
InventoryItem[item_name=Pickle,price=60]
InventoryItem[item_name=Aardvark,price=11]
InventoryItem[item_name=Banana,price=123]

Two Ways to Traverse: In-Words Version

  1. Depth-First Search (DFS): With this approach, we iterate through the BST by always taking the left child as the “next” child until we hit a leaf node (which means, we cannot follow this left-child pointer any longer, since a leaf node does not have a left child or a right child!), and only at that point do we back up and take the right children we skipped.
  2. Breadth-First Search (BFS): This is the “opposite” of DFS in the sense that we traverse the tree level-by-level, never moving to the next level of the tree until we’re sure that we have visited every node on the current level.

Two Ways to Traverse: Animated Version

Depth-First Search (from Wikimedia Commons)

Breadth-First Search (from Wikimedia Commons)

Two Ways to Traverse: Underlying Data Structures

  • Now that you have some intuition, you may be thinking that they might require very different code to implement 🤔

  • This is where mathematically-formal link between the two becomes ultra helpful!

  • It turns out (and a full-on algorithmic theory course makes you prove) that

    1. Depth-First Search can be accomplished by processing nodes in an order determined by adding each to a stack, while
    2. Breadth-First Search can be accomplished by processing nodes in an order determined by adding each to a queue!
  • \(\implies\) Literally identical code, “pulling out” the word stack and replacing it with the word queue within your code (or vice-versa).

  • Put your Software Engineer Hat on: this calls for an abstraction layer!

    • (OOP hint: Notice how we’re not teaching you how Queue and Stack work, and not asking you to implement it 🤨)

Two Ways to Traverse: HW3 Version

  • You’ll make a class called NodeProcessor, with a single iterate_over(tree) function
  • This function—without any changes in the code or even any if statements!—will be capable of both DFS and BFS
    • That’s a hint: lose points if there’s an if statement! 👀 (bc, the whole point of OOP! Encapsulation, abstraction)
  • Constructor takes in a ThingContainer (could be a stack or a queue, you won’t know which), with two functions:
    • put_new_thing_in(new_thing)
    • take_existing_thing_out()

One Animal in the BFS Species

  • For this and next slide, constant-time preliminary step: “Add root to container” (a ThingContainer object)
Procedure Algorithm
Breadth-First Search while [container not empty]:
    1. Take thing out
    2. Print thing content
    3. Add left child to container
    4. Add right child to container

Three Animals in the DFS Species

DFS Procedure Algorithm
Pre-Order Traversal 1. Print node
2. DFS left subtree
3. DFS right subtree
In-Order Traversal 🧐‼️ 1. DFS left subtree
2. Print node
3. DFS right subtree
Post-Order Traversal 1. DFS left subtree
2. DFS right subtree
3. Print node

The Three Animals Traverse our Inventory Tree

Code
visualize(bst)
Warning: node None_0, port f1 unrecognized
Warning: node None_1, port f1 unrecognized
Warning: node None_2, port f1 unrecognized
Warning: node None_3, port f1 unrecognized
Warning: node None_4, port f1 unrecognized
Warning: node None_5, port f1 unrecognized

The Great Wheel of Data-Structural and Algorithmic Fate

Mini-Topics

Stack-Heap Distinction

  • The Stack (\(\neq\) Stack data structure)
    • Fixed-length things go here, including Pointers to Heap
  • The Heap (\(\neq\) Heap data structure)
    • Variable-length things go here
  • Heap elves manage The Heap, by…
    • Constantly “claiming” additional memory from the OS (malloc()), in case objects need to grow
    • “Freeing” memory back for use by the OS (free()), when objects shrink / deleted / Python execution ends

Big-\(\overline{O}\) Notation

  • These are equivalence classes (not numbers or functions)
  • Compute the runtime \(T(n)\) of an algorithm
  • Worry about how it scales as \(n\) gets large: \(\lim_{n \rightarrow \infty}T(n)\)
  • Decide whether to use it or not based on which equivalence class it converges to:
    • \(O(1)\)
    • \(O(\log(n))\)
    • \(O(n)\)
    • \(O(n^2)\)

Data Validation

Linear vs. Logarithmic Design

Trees vs. Graphs (Cycles)

  • In terms of creating and managing data structures, we start with Trees and move to Graphs
  • But, in terms of defining these structures, it helps to start with Graphs
  • A graph is just a collection of linked nodes (any number of links): connected if some possible path between any two nodes
  • A tree is a connected graph without cycles
  • A binary tree is a tree where each node has only two outward links (called children)
  • A binary search tree is a binary tree where
    • Outward links are labeled left and right
    • [All contents after following left] \(<\) [current content]
    • [All contents after following right] \(>\) [current content]

Full Topics

Variations on LinkedList

  • FrontBackLinkedList
  • Deque (Pronounced like “Deck”)
  • DoublyLinkedList

FrontBackLinkedList

  • This isn’t exactly a separate data structure from a LinkedList, but instead an “expansion pack” for LinkedList which adds an insert_front() function

DoublyLinkedList

  • For when we need to be able to reverse directions at any point during iteration
  • LinkedListNode had only a next pointer
  • DoublyLinkedListNode has two pointers: prev and next

Deque

  • May seem similar to FrontBackLinkedList, but this time we do have a (slightly) different data structure!
  • For when we need to iterate through a list in-order or in reverse order, with exact same complexity!
    • (Who can think of a case where this would be immediately useful?)
  • Like in FrontBackLinkedList, we have a new insert_front() function
  • Like in DoublyLinkedList, we have both prev and next pointers in each DequeNode
  • But, we also add a new pointer in the Deque class itself (not the DequeNode class), tail, so that we now have head (formerly called root) and tail (pointing to the last element in the LL)

Collision-Resistant Hashing

  • In general: What properties do we think a “good” hashing algorithm should have?
  • More specific: Given \(N\) objects to store and \(M\) available memory slots, what are the limits to \(\overline{O}(1)\)?
    • Efficiency when \(N < M\)?
    • Efficiency when \(N = M\)?
    • Efficiency when \(N > M\)?
    • How do these change if we move from LinkedList-backed to BST-backed?

Object-Oriented Design

  • First: What are the two kinds of “things” that an object has?
  • What is the difference between a class and an object?
  • OML without Polymorphism
  • OML with Polymorphism
  • Abstract Base Classes (ABC in Python)

Sorting

  • How is Merge-Sort “better”? What role does Merge subroutine play?
  • What/where exactly are the invariants in these diagrams?
Insertion-Sort

Merge-Sort

Midterm Metadata

Midterm Structure

  • Coding Portion: Modifications of LinkedList (Circular / jump-to-end / doubly-linked); non-scary OOP skeleton 🙈
  • Multiple Choice Portion: Lots more to cover…
    • Hash Tables: \(O(1 + \epsilon \log_2(n))\), but think about it as:
    • \(1 + (\text{Collision rate}) \cdot (\text{Collision handler efficiency})\)
    • Linked List \(\rightarrow\) Binary Search Tree
    • Depth-First vs. Breadth-First: Picture of a tree \(\rightarrow\) (a) what is BFS result, (b) what is (in/pre/post)-order DFS result?
    • Lastly: Cormen, Leiserson, Rivest, Stein (CLRS), pgs. 17-106

The Two Boxes That Most Things In This Course Can Be Sorted Into

  • Box 1: Linear Things
  • Box 2: Logarithmic Things
  • Things that go into the boxes:
    • Algorithms
    • Data Structures
    • Software Development Patterns

The Boxes

Linear Things: \(O(N)\) Logarithmic Things: \(O(\lg{N})\)
Data Structures

LinkedList

BinarySearchTree

Sorting Algorithms Insertion-Sort Merge-Sort
Search Algorithms Linear-Search Binary-Search
General Pattern One-by-One Divide-and-Conquer
Steps to Look Up a Word \(N = 102118\) \(\lceil \log_2(N) \rceil = 17\)
  • Hash Table: A “trick” that gets us close to \(O(1)\), by pre-allocating lots of memory!

\[ O(N) \; \underbrace{\leadsto O(\log_2(N))}_{\mathclap{\substack{\text{More Efficient Algorithm} \\ \text{(Free!)}}}} \; \underbrace{\leadsto O(1 + \epsilon\log_2(N))}_{\substack{\text{More Memory} \\ \text{(\$\$\$)}}} \]

Homework 3

  • Aka the Midterm Prep assignment
  • Not due until Thursday, March 12, 11:59pm
  • But, finishing it before the Midterm will be super good prep!

BST Building Blocks

  • For HW3, we provide you with an InventoryItem class
  • Two instance variables: item_name and price
  • Equivalence relations:
    • __eq__(other), __ne__(other)
  • Ordering relations:
    • __lt__(other), __le__(other), __gt__(other), __ge__(other)
  • Bonus: __repr__() and __str__()

ThingContainer: The OOP Structure You See

Secret Full OOP Structure You Don’t See

HashTables: More Memory, More Performance

  • The last part challenges you to ask: why stop at a hash based on just the first letter of the key?
  • We could just as easily use the first two letters:
  • h('AA') = 0, h('AB') = 1, …, h('AZ') = 25,
  • h('BA') = 26, h('BB') = 27, …, h('BZ') = 51,
  • h('CA') = 52, …, h('ZZ') = 675.
  • You will see how this gets us even closer to the elusive \(O(1)\)! And we could get even closer with three letters, four letters, … 🤔🤔🤔

AlphaHasher vs. CustomHasher

# @title define-alpha-hasher
from abc import ABC, abstractmethod
import string

class CustomHasher(ABC):
  @abstractmethod
  def __init__(self):
    pass

  @abstractmethod
  def get_alphabet_size(self):
    pass

  @abstractmethod
  def hash(self, str_to_hash: str) -> int:
    pass

  @abstractmethod
  def compute_position_in_alphabet(self, uppercase_key: str) -> int:
    pass

  @abstractmethod
  def compute_key_for_index(self, index: int) -> str:
    pass

class AlphaHasher(CustomHasher):
  def __init__(self):
    self.alphabet_size = 26

  def get_alphabet_size(self):
    return self.alphabet_size

  def hash(self, str_to_hash: str) -> int:
    if len(str_to_hash) == 0:
      first_letter = 'A'
    else:
      first_letter = str_to_hash.upper()[0]
    # And return its index in the alphabet:
    # 'A' has index 0, 'B' has index 1, etc.
    return self.compute_position_in_alphabet(first_letter)

  def compute_position_in_alphabet(self, uppercase_key: str) -> int:
    return string.ascii_uppercase.index(uppercase_key)

  def compute_key_for_index(self, index: int) -> str:
    return string.ascii_uppercase[index]

LinkedList via PolymorphicNode

  • EmptyNode \(\rightarrow\) ContentNode
# @title define-linked-list
from abc import ABC, abstractmethod

class LinkedList:
  def __init__(self):
    self.root = EmptyNode()

  def append(self, item):
    self.root = self.root.append(item)

  def find_item_steps(self, item):
    return self.root.find_item_steps(item)

  def to_string(self, recurse: bool):
    return f'LinkedList[{self.root.to_string(recurse)}]'

  def __repr__(self):
    return self.to_string(recurse=True)

  def __str__(self):
    return self.to_string(recurse=False)

class PolymorphicNode(ABC):
  @abstractmethod
  def __init__(self):
    pass

  @abstractmethod
  def append(self, item):
    pass

  @abstractmethod
  def find_item_steps(self, item):
    pass

  @abstractmethod
  def to_string(self, recurse: bool):
    pass

  def __repr__(self):
    return self.to_string(recurse=True)

  def __str__(self):
    return self.to_string(recurse=False)

class EmptyNode(PolymorphicNode):
  def __init__(self):
    super().__init__()

  def append(self, item):
    """
    This is the only weird part of EmptyNode: because we want to utilize
    *polymorphism*, when append() is called on an EmptyNode it
    is "transformed into" a FilledNode! That is why, in the
    LinkedList's append() function, we have self.root = self.root.append(),
    and why the FilledNode's append() function works the same way
    """
    #print("EmptyLinkedListNode.append()")
    new_form = ContentNode(item)
    return new_form

  def find_item_steps(self, item):
    return np.inf

  def __len__(self):
    return 0

  def to_string(self, recurse: bool):
    return ''

class ContentNode(PolymorphicNode):
  def __init__(self, content_arg):
    super().__init__()
    self.content = content_arg
    self.next = EmptyNode()

  def append(self, item):
    self.next = self.next.append(item)
    # Return just *self*, since we *haven't* transformed the type of
    # FilledLinkedListNode by appending another element to it
    return self

  def find_item_steps(self, item):
    if self.content == item or self.content[0] == item:
      return 1
    return 1 + self.next.find_item_steps(item)

  def get_content(self):
    return self.content

  def __len__(self):
    return 1 + len(self.next)

  def to_string(self, recurse: bool):
    content_str = f'ContentNode[{str(self.get_content())}] '
    if not recurse:
      return content_str
    next_str = str(self.next)
    return f'{content_str}{self.next.to_string(recurse)}'

Tuple vs. InventoryItem (Part 3.1)

# @title define-inventory-item
class InventoryItem:
  def __init__(self, item_name_arg, price_arg):
    self.item_name = item_name_arg
    self.price = price_arg

  def __lt__(self, other): # -> [NotImplemented | bool]:
    if isinstance(other, InventoryItem):
      return self.item_name < other.item_name
    if isinstance(other, str):
      return self.item_name < other
    return NotImplemented

  def __le__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name <= other.item_name
    if isinstance(other, str):
      return self.item_name <= other
    return NotImplemented

  def __gt__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name > other.item_name
    if isinstance(other, str):
      return self.item_name > other
    return NotImplemented

  def __ge__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name >= other.item_name
    if isinstance(other, str):
      return self.item_name >= other
    return NotImplemented

  def __eq__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name == other.item_name
    if isinstance(other, str):
      return self.item_name == other
    return NotImplemented

  def __ne__(self, other): # -> [NotImplemented | bool]
    if isinstance(other, InventoryItem):
      return self.item_name != other.item_name
    if isinstance(other, str):
      return self.item_name != other
    return NotImplemented

  def __repr__(self) -> str:
    return self.__str__()

  def __str__(self) -> str:
    return f'InventoryItem[item_name={self.item_name},price={self.price}]'

ThingContainer (Part 3.2)

# @title define-thing-container
class ThingContainer:
  def __init__(self):
    self.internal_list = []

  @abstractmethod
  def put_new_thing_in(self, item):
    pass

  def is_empty(self) -> bool:
    return self.__len__() == 0

  def __len__(self):
    return len(self.internal_list)

  @abstractmethod
  def take_existing_thing_out(self):
    pass

class Stack(ThingContainer):
  def __init__(self):
    super().__init__()

  def __push(self, item):
    self.internal_list.append(item)

  def __pop(self):
    return self.internal_list.pop()

  def put_new_thing_in(self, item):
    return self.__push(item)

  def take_existing_thing_out(self):
    return self.__pop()

class Queue(ThingContainer):
  def __init__(self):
    super().__init__()

  def put_new_thing_in(self, item):
    return self.__enqueue(item)

  def __enqueue(self, item):
    self.internal_list.insert(0, item)

  def __dequeue(self):
    return self.internal_list.pop()

  def take_existing_thing_out(self):
    return self.__dequeue()

AlphaHasher2 (Part 5)

# @title define-alpha-hasher-2
class AlphaHasher2(CustomHasher):
  def __init__(self):
    self.alphabet_size = 26 * 26

  def get_alphabet_size(self):
    return self.alphabet_size

  def hash(self, str_to_hash: str) -> int:
    if len(str_to_hash) == 0:
      first_two_letters = 'AA'
    elif len(str_to_hash) == 1:
      first_letter = str_to_hash.upper()[0]
      second_letter = 'A'
    else:
      first_letter = str_to_hash.upper()[0]
      second_letter = str_to_hash.upper()[1]
    #print(f'First two letters for {str_to_hash}: {first_letter}{second_letter}')
    # And return its index:
    # 'AA' has index 0, 'AB' has index 1, etc.
    first_letter_pos = AlphaHasher2.compute_position_in_alphabet(first_letter)
    second_letter_pos = AlphaHasher2.compute_position_in_alphabet(second_letter)
    # The position in the two-letter alphabet is just 26*first + second
    final_pos = 26 * first_letter_pos + second_letter_pos
    return final_pos

  def compute_position_in_alphabet(self, uppercase_key: str) -> int:
    if uppercase_key in string.ascii_uppercase:
      return string.ascii_uppercase.index(uppercase_key)
    return 0

  def compute_key_for_index(self, index: int) -> str:
    first_letter_part = int(index / 26)
    second_letter_part = index % 26
    # In case you need to debug!
    #print(f'alpha2_index: {index}, first_letter_part: {first_letter_part}, second_letter_part: {second_letter_part}')
    return string.ascii_uppercase[first_letter_part] + string.ascii_uppercase[second_letter_part]

Footnotes

  1. This print() step is going to become important! We’ll say we’ve “processed” a node once we’ve print()ed its content↩︎