Week 5: Linked Lists and Binary Search Trees

DSAN 5500: Data Structures, Objects, and Algorithms in Python

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Thursday, February 5, 2026

Open slides in new window →

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 7:00pm Inductive Thinking →
7:00pm 7:30pm LinkedList: Our Core Linear Data Structure →
7:30pm 8:00pm BinarySearchTree aka FancyLinkedList
Break! 8:00pm 8:10pm
8:10pm 9:00pm HashTable: The Final Missing Piece! →

Induction: The Core “Way of Thinking” for Algorithm Design…

  • Jeff has messed up the “intuitive” explanation for Gauss’ Sum on the board every semester since 2010
  • Most years this is a problem
  • But this year it’s an opportunity to teach induction!

The Intuition

(What I should have done on the board)

  • \(\Rightarrow \displaystyle\sum_{i=1}^{6} = \frac{6(6+1)}{2}\) …but can we infer \(\displaystyle \sum_{i=1}^{n} = \frac{n(n+1)}{2}\)? Not in general!
  • (When you do it’s called “Engineer’s induction”… see next slide)
  • (Double no in this case since we assumed \(n\) even!)
  • But, with only slight increase in complicated-ness, can prove it by induction, which is used in… 95% of algorithm/data structures proofs! (We’ll see why)

What’s Wrong With Engineers’ Induction?

  • What if I told you, I have a simple polynomial that we can use to infinitely generate prime numbers! \(\boxed{f(n) = n^2 + n + 41}\)… Let’s try it!
\(n\) \(f(n)\) Prime?
0 41
1 43
2 47
3 53
\(\vdots\) \(\vdots\)

\(n\) \(f(n)\) Prime?
38 1523
I can stop checking now right?
39 1601
Surely I’ve checked enough terms?
40 1681 \(1681 = 41 \times 41\) 💀
Maybe that’s the only exception?
41 1763 \(1763 = 41 \times 43\) 💀
Maybe we can get composites \(n > 41\)?
42 1847 ✅ 💀💀💀 USELESS!

(Non-Engineers’) Induction!

  • Goal: Prove \(p(n)\) (“predicate”, evaluates to T/F for given \(n \in \mathbb{Z}^{\geq 0}\))
  • The core is literally just: If I know
    • Base Case: [\(p\) is true for \(n = n_0\)]
    • Inductive Step: [\(p\) is true for \(n\)] \(\implies\) [\(p\) is true for \(n + 1\)]
  • Then I can conclude \(p(n)\) true for all \(n \geq 1\)!

  • Gauss’ Summation Formula: \(p(n) = \left[ \sum_{i=0}^{n}i = \frac{n(n+1)}{2} \right]\)
  • Base Case: \(n_0 = 1\):

\[ \sum_{i=0}^{1} = 0 + 1 \overset{?}{=} \frac{(1)(1+1)}{2} \iff 1 \overset{?}{=} \frac{2}{2} \iff 1 \overset{✅}{=} 1 \]

Inductive Step: Can I Infer \(n + 1\) From \(n\)?

  • Assume \(p(n)\), that is, \(\displaystyle\sum_{i=0}^{n}i = \frac{n(n+1)}{2}\).
  • Goal: Derive \(p(n + 1)\), that is, \(\displaystyle\sum_{i=0}^{n+1}i = \frac{(n+1)(n+2)}{2}\)
  • Start with LHS term, \(\sum_{i=0}^{n+1}i\):

\[ \begin{align*} \boxed{\sum_{i=0}^{n+1}i} &= \underbrace{0 + 1 + 2 + \cdots + n}_{\sum_{i=0}^{n}i} + (n+1) = \underbrace{\sum_{i=0}^{n}i}_{\mathclap{\text{LHS of }p(n)}} + (n+1) \\ &= \frac{n(n+1)}{2} + (n + 1) = \frac{n^2 + n}{2} + \frac{2(n+1)}{2} \\ &= \frac{n^2 + 3n + 2}{2} = \boxed{\frac{(n+1)(n+2)}{2}} \; ~ ✅ \end{align*} \]

(Algorithmic/Data-Structural Proofs Almost Always Inductive!)

  • “Direct” proof, in this case, is doable but requires stuff like…

  • Break into [Case 1: \(n\) even] and [Case 2: \(n\) odd]

  • In case 2

    \[ \sum_{i=0}^{n}i = 1 + 2 + \cdots + \left( \left\lceil \frac{n}{2} \right\rceil - 1\right) + \left\lceil \frac{n}{2} \right\rceil + \left( \left\lceil \frac{n}{2} \right\rceil + 1\right) + \cdots + (n-1) + n \]

  • Need to show (…with induction?) \(\left\lceil \frac{n}{2} \right\rceil = \begin{cases}\frac{n}{2} &\text{if }n\text{ even} \\ \frac{n+1}{2} &\text{if }n\text{ odd}\end{cases}\)

  • And so on…

Core Data Structure: LinkedList

Looking Under the Hood of a Data Structure

  • Last week we saw the math for why we can “abstract away from” the details of how a particular language works
  • We want to understand these structures independently of the specifics of their implementation in Python (for now)
  • So, let’s construct our own simplified versions of the basic structures, and use these simplified versions to get a sense for their efficiency
    • (The “true” Python versions may be hyper-optimized but, as we’ll see, there are fundamental constraints on runtime, assuming \(P \neq NP\))

Tuples

Code
class MyTuple:
  def __init__(self, thing1, thing2):
    self.thing1 = thing1
    self.thing2 = thing2

  def __repr__(self):
    return f"({self.thing1}, {self.thing2})"

  def __str__(self):
    return self.__repr__()

t1 = MyTuple('a','b')
t2 = MyTuple(111, 222)
print(t1)
print(t2)
(a, b)
(111, 222)

Lists

The list itself just points to a root item:

Code
class MyList:
  def __init__(self):
    self.root = None

  def append(self, new_item):
    if self.root is None:
      self.root = MyListItem(new_item)
    else:
      self.root.append(new_item)

  def __repr__(self):
    return self.root.__repr__()

An item has contents, pointer to next item:

Code
class MyListItem:
  def __init__(self, content):
    self.content = content
    self.next = None

  def append(self, new_item):
    if self.next is None:
      self.next = MyListItem(new_item)
    else:
      self.next.append(new_item)

  def __repr__(self):
    my_content = self.content
    return my_content if self.next is None else f"{my_content}, {self.next.__repr__()}"
Code
users = MyList()
users.append('Jeff')
users.append('Alma')
users.append('Bo')
print(users)
Jeff, Alma, Bo

So, How Many “Steps” Are Required…

  • To retrieve the first element in a MyTuple?
  • To retrieve the last element in a MyTuple?
  • To retrieve the first element in a MyList?
  • To retrieve the last element in a MyList?

How Many Steps?

With a MyTuple:

Code
t1.thing1
'a'

\(\implies\) 1 step

Code
t1.thing2
'b'

\(\implies\) 1 step

With a MyList:

Code
print(users.root.content)
Jeff

\(\implies\) 1 step

Code
current_node = users.root
while current_node.next is not None:
  current_node = current_node.next
print(current_node.content)
Bo

\(\implies\) (3 steps)

…But why 3? How many steps if the list contained 5 elements? \(N\) elements?

Visualizing LinkedLists

Code
from hw2 import LinkedList, InventoryItem
ll = LinkedList()
item1 = InventoryItem('Mango', 50)
ll.append(item1)
item2 = InventoryItem('Pickle', 60)
ll.append(item2)
item3 = InventoryItem('Artichoke', 55)
ll.append(item3)
item5 = InventoryItem('Banana', 123)
ll.append(item5)
item6 = InventoryItem('Aardvark', 11)
ll.append(item6)
HTML(visualize_ll(ll))

Onwards and Upwards: Fancier Algorithms

LinkedList: Foundation for Most(?) Data Structures!

class LinkedList(BaseModel):
  root: LinkedListNode | None
class LinkedListNode(BaseModel):
  content: object
  next: LinkedListNode | None

class BinaryTree(BaseModel):
  root: BinaryTreeNode | None
class BinaryTreeNode(BaseModel):
  content: object
  left: BinaryTreeNode | None
  right: BinaryTreeNode | None

class QuadTree(BaseModel):
  root: QuadTreeNode | None
class QuadTreeNode:
  content: object
  nw: QuadTreeNode | None
  ne: QuadTreeNode | None
  sw: QuadTreeNode | None
  se: QuadTreeNode | None

Visualizing BinarySearchTree

Code
from hw2 import BinarySearchTree
bst = BinarySearchTree()
item1 = InventoryItem('Mango', 50)
bst.add(item1)
item2 = InventoryItem('Pickle', 60)
bst.add(item2)
item3 = InventoryItem('Artichoke', 55)
bst.add(item3)
item5 = InventoryItem('Banana', 123)
bst.add(item5)
item6 = InventoryItem('Aardvark', 11)
bst.add(item6)
HTML(visualize(bst))
Warning: node None_0, port f1 unrecognized
Warning: node None_1, port f1 unrecognized
Warning: node None_2, port f1 unrecognized
Warning: node None_3, port f1 unrecognized
Warning: node None_4, port f1 unrecognized
Warning: node None_5, port f1 unrecognized

So Then… Why Is This a Whole Class?

  • The core structures are identical, but we can optimize different goals (efficient insertion, sorting, retrieval, deletion, …) by changing the invariants maintained by the algorithms internal to our structure
  • Crucial Insertion-Sort invariant: \(\textsf{Sorted}(1,i)\) true when we move to entry \(i + 1\) (key)
  • Crucial HW2(!) invariant: \(\textsf{Up-To-Date-Favorite}(1,i-1)\) true when entry \(i + 1\) (next result in dataset) arrives
  • \(\implies\) Efficiency of obtaining favorite style guaranteed to be constant-time, \(\overline{O}(1)\)!
  • Otherwise, would be \(\overline{O}(n) > \overline{O}(1)\) (linear approach) or at best \(\overline{O}(\log_2(n)) > \overline{O}(1)\) (divide-and-conquer)