Week 5: Linked Lists and Binary Search Trees

DSAN 5500: Data Structures, Objects, and Algorithms in Python

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Thursday, February 5, 2026

Open slides in new window →

Schedule

Today’s Planned Schedule:

	Start	End	Topic
Lecture	6:30pm	7:00pm	Inductive Thinking →
	7:00pm	7:30pm	`LinkedList`: Our Core Linear Data Structure →
	7:30pm	8:00pm	`BinarySearchTree` aka `FancyLinkedList` →
Break!	8:00pm	8:10pm
	8:10pm	9:00pm	`HashTable`: The Final Missing Piece! →

Induction: The Core “Way of Thinking” for Algorithm Design…

Jeff has messed up the “intuitive” explanation for Gauss’ Sum on the board every semester since 2010
Most years this is a problem…
But this year it’s an opportunity to teach induction!

The Intuition

(What I should have done on the board)

\(\Rightarrow \displaystyle\sum_{i=1}^{6} = \frac{6(6+1)}{2}\) …but can we infer \(\displaystyle \sum_{i=1}^{n} = \frac{n(n+1)}{2}\)? Not in general!
(When you do it’s called “Engineer’s induction”… see next slide)
(Double no in this case since we assumed \(n\) even!)
But, with only slight increase in complicated-ness, can prove it by induction, which is used in… 95% of algorithm/data structures proofs! (We’ll see why)

What’s Wrong With Engineers’ Induction?

What if I told you, I have a simple polynomial that we can use to infinitely generate prime numbers! \(\boxed{f(n) = n^2 + n + 41}\)… Let’s try it!

\(n\)	\(f(n)\)	Prime?
0	41	✅
1	43	✅
2	47	✅
3	53	✅
\(\vdots\)	\(\vdots\)	✅

\(n\)	\(f(n)\)	Prime?
38	1523	✅
I can stop checking now right?
39	1601	✅
Surely I’ve checked enough terms?
40	1681	❌ \(1681 = 41 \times 41\) 💀
Maybe that’s the only exception?
41	1763	❌ \(1763 = 41 \times 43\) 💀
Maybe we can get composites* \(n > 41\)?*
42	1847	✅ 💀💀💀 USELESS!

(Non-Engineers’) Induction!

Goal: Prove \(p(n)\) (“predicate”, evaluates to T/F for given \(n \in \mathbb{Z}^{\geq 0}\))
The core is literally just: If I know
- Base Case: [\(p\) is true for \(n = n_0\)]
- Inductive Step: [\(p\) is true for \(n\)] \(\implies\) [\(p\) is true for \(n + 1\)]
Then I can conclude \(p(n)\) true for all \(n \geq 1\)!

Gauss’ Summation Formula: \(p(n) = \left[ \sum_{i=0}^{n}i = \frac{n(n+1)}{2} \right]\)
Base Case: \(n_0 = 1\):

\[ \sum_{i=0}^{1} = 0 + 1 \overset{?}{=} \frac{(1)(1+1)}{2} \iff 1 \overset{?}{=} \frac{2}{2} \iff 1 \overset{✅}{=} 1 \]

Inductive Step: Can I Infer \(n + 1\) From \(n\)?

Assume \(p(n)\), that is, \(\displaystyle\sum_{i=0}^{n}i = \frac{n(n+1)}{2}\).
Goal: Derive \(p(n + 1)\), that is, \(\displaystyle\sum_{i=0}^{n+1}i = \frac{(n+1)(n+2)}{2}\)…
Start with LHS term, \(\sum_{i=0}^{n+1}i\):

\[ \begin{align*} \boxed{\sum_{i=0}^{n+1}i} &= \underbrace{0 + 1 + 2 + \cdots + n}_{\sum_{i=0}^{n}i} + (n+1) = \underbrace{\sum_{i=0}^{n}i}_{\mathclap{\text{LHS of }p(n)}} + (n+1) \\ &= \frac{n(n+1)}{2} + (n + 1) = \frac{n^2 + n}{2} + \frac{2(n+1)}{2} \\ &= \frac{n^2 + 3n + 2}{2} = \boxed{\frac{(n+1)(n+2)}{2}} \; ~ ✅ \end{align*} \]

(Algorithmic/Data-Structural Proofs Almost Always Inductive!)

“Direct” proof, in this case, is doable but requires stuff like…
Break into [Case 1: \(n\) even] and [Case 2: \(n\) odd]
In case 2

\[ \sum_{i=0}^{n}i = 1 + 2 + \cdots + \left( \left\lceil \frac{n}{2} \right\rceil - 1\right) + \left\lceil \frac{n}{2} \right\rceil + \left( \left\lceil \frac{n}{2} \right\rceil + 1\right) + \cdots + (n-1) + n \]
Need to show (…with induction?) \(\left\lceil \frac{n}{2} \right\rceil = \begin{cases}\frac{n}{2} &\text{if }n\text{ even} \\ \frac{n+1}{2} &\text{if }n\text{ odd}\end{cases}\)
And so on…

Core Data Structure: `LinkedList`

Looking Under the Hood of a Data Structure

Last week we saw the math for why we can “abstract away from” the details of how a particular language works
We want to understand these structures independently of the specifics of their implementation in Python (for now)
So, let’s construct our own simplified versions of the basic structures, and use these simplified versions to get a sense for their efficiency
- (The “true” Python versions may be hyper-optimized but, as we’ll see, there are fundamental constraints on runtime, assuming \(P \neq NP\))

Tuples

Code

class MyTuple:
  def __init__(self, thing1, thing2):
    self.thing1 = thing1
    self.thing2 = thing2

  def __repr__(self):
    return f"({self.thing1}, {self.thing2})"

  def __str__(self):
    return self.__repr__()

t1 = MyTuple('a','b')
t2 = MyTuple(111, 222)
print(t1)
print(t2)

(a, b)
(111, 222)

Lists

The list itself just points to a root item:

Code

class MyList:
  def __init__(self):
    self.root = None

  def append(self, new_item):
    if self.root is None:
      self.root = MyListItem(new_item)
    else:
      self.root.append(new_item)

  def __repr__(self):
    return self.root.__repr__()

An item has contents, pointer to next item:

Code

class MyListItem:
  def __init__(self, content):
    self.content = content
    self.next = None

  def append(self, new_item):
    if self.next is None:
      self.next = MyListItem(new_item)
    else:
      self.next.append(new_item)

  def __repr__(self):
    my_content = self.content
    return my_content if self.next is None else f"{my_content}, {self.next.__repr__()}"

Code

users = MyList()
users.append('Jeff')
users.append('Alma')
users.append('Bo')
print(users)

Jeff, Alma, Bo

So, How Many “Steps” Are Required…

To retrieve the first element in a MyTuple?
To retrieve the last element in a MyTuple?
To retrieve the first element in a MyList?
To retrieve the last element in a MyList?

How Many Steps?

With a MyTuple:

Code

t1.thing1

'a'

\(\implies\) 1 step

Code

t1.thing2

'b'

\(\implies\) 1 step

With a MyList:

Code

print(users.root.content)

Jeff

\(\implies\) 1 step

Code

current_node = users.root
while current_node.next is not None:
  current_node = current_node.next
print(current_node.content)

Bo

\(\implies\) (3 steps)

…But why 3? How many steps if the list contained 5 elements? \(N\) elements?

Visualizing `LinkedList`s

Code

from hw2 import LinkedList, InventoryItem
ll = LinkedList()
item1 = InventoryItem('Mango', 50)
ll.append(item1)
item2 = InventoryItem('Pickle', 60)
ll.append(item2)
item3 = InventoryItem('Artichoke', 55)
ll.append(item3)
item5 = InventoryItem('Banana', 123)
ll.append(item5)
item6 = InventoryItem('Aardvark', 11)
ll.append(item6)
HTML(visualize_ll(ll))

Onwards and Upwards: Fancier Algorithms

LinkedList: Foundation for Most(?) Data Structures!

class LinkedList(BaseModel):
  root: LinkedListNode | None

class LinkedListNode(BaseModel):
  content: object
  next: LinkedListNode | None

class BinaryTree(BaseModel):
  root: BinaryTreeNode | None

class BinaryTreeNode(BaseModel):
  content: object
  left: BinaryTreeNode | None
  right: BinaryTreeNode | None

class QuadTree(BaseModel):
  root: QuadTreeNode | None

class QuadTreeNode:
  content: object
  nw: QuadTreeNode | None
  ne: QuadTreeNode | None
  sw: QuadTreeNode | None
  se: QuadTreeNode | None

Visualizing `BinarySearchTree`

Code

from hw2 import BinarySearchTree
bst = BinarySearchTree()
item1 = InventoryItem('Mango', 50)
bst.add(item1)
item2 = InventoryItem('Pickle', 60)
bst.add(item2)
item3 = InventoryItem('Artichoke', 55)
bst.add(item3)
item5 = InventoryItem('Banana', 123)
bst.add(item5)
item6 = InventoryItem('Aardvark', 11)
bst.add(item6)
HTML(visualize(bst))

Warning: node None_0, port f1 unrecognized
Warning: node None_1, port f1 unrecognized
Warning: node None_2, port f1 unrecognized
Warning: node None_3, port f1 unrecognized
Warning: node None_4, port f1 unrecognized
Warning: node None_5, port f1 unrecognized

So Then… Why Is This a Whole Class?

The core structures are identical, but we can optimize different goals (efficient insertion, sorting, retrieval, deletion, …) by changing the invariants maintained by the algorithms internal to our structure
Crucial Insertion-Sort invariant: \(\textsf{Sorted}(1,i)\) true when we move to entry \(i + 1\) (key)
Crucial HW2(!) invariant: \(\textsf{Up-To-Date-Favorite}(1,i-1)\) true when entry \(i + 1\) (next result in dataset) arrives
\(\implies\) Efficiency of obtaining favorite style guaranteed to be constant-time, \(\overline{O}(1)\)!
Otherwise, would be \(\overline{O}(n) > \overline{O}(1)\) (linear approach) or at best \(\overline{O}(\log_2(n)) > \overline{O}(1)\) (divide-and-conquer)

Schedule

Induction: The Core “Way of Thinking” for Algorithm Design…

The Intuition

What’s Wrong With Engineers’ Induction?

(Non-Engineers’) Induction!

Inductive Step: Can I Infer \(n + 1\) From \(n\)?

(Algorithmic/Data-Structural Proofs Almost Always Inductive!)

Core Data Structure: LinkedList

Looking Under the Hood of a Data Structure

Tuples

Lists

So, How Many “Steps” Are Required…

How Many Steps?

Visualizing LinkedLists

Onwards and Upwards: Fancier Algorithms

LinkedList: Foundation for Most(?) Data Structures!

Visualizing BinarySearchTree

So Then… Why Is This a Whole Class?

Core Data Structure: `LinkedList`

Visualizing `LinkedList`s

Visualizing `BinarySearchTree`