Code
= (38.9076, -77.0723)
gtown_loc gtown_loc
(38.9076, -77.0723)
DSAN 5500: Data Structures, Objects, and Algorithms in Python
Today’s Planned Schedule:
Start | End | Topic | |
---|---|---|---|
Lecture | 6:30pm | 6:40pm | TA Intros! |
6:40pm | 7:00pm | Python’s Built-In Data Structures → | |
7:00pm | 7:30pm | Big-O Notation and Complexity Classes | |
7:30pm | 8:00pm | Formal Complexity Analysis: Insertion-Sort → | |
Break! | 8:00pm | 8:10pm | |
8:10pm | 8:30pm | Formal Complexity Analysis: Merge-Sort → | |
8:30pm | 9:00pm | Object-Oriented Programming → |
jrd154@georgetown.edu
jl3081@georgetown.edu
bool
int
float
None
Fixed-size collection of \(N\) objects
Unless otherwise specified, we’re talking about \(2\)-tuples
Example: We can locate something on the Earth by specifying two floats
: latitude and longitude!
= (38.9076, -77.0723)
gtown_loc gtown_loc
(38.9076, -77.0723)
But what if we don’t know in advance how many items we want to store? Ex: how can we store users for a new app?
x = ['a','b','c']
x[0] = 'a'
x[1] = 'b'
x[2] = 'c'
list
List<String> myList = Arrays.asList("a", "b", "c");
for (int i = 0; i < x.size(); i++) {
System.out.println(myList.get(i));
}
a b c
= ['a','b','c']
my_list for list_element in my_list:
print(list_element)
a
b
c
Construct new list by applying operation to each element:
= [4,5,6,7]
my_nums = [num ** 2 for num in my_nums]
my_squares my_squares
[16, 25, 36, 49]
Can also filter the elements of the list with if
:
= [num ** 2 for num in my_nums if num % 2 == 1]
my_odd_squares my_odd_squares
[25, 49]
Efficient for finding unique elements:
= [
prefs_morn 'None','None','None','Vegan','Vegan','None',
'Gluten Free'
]print(f"Number of responses: {len(prefs_morn)}")
Number of responses: 7
= set(prefs_morn)
unique_prefs_morn print(f"Unique preferences: {unique_prefs_morn}")
print(f"Number of unique preferences: {len(unique_prefs_morn)}")
Unique preferences: {'Gluten Free', 'None', 'Vegan'}
Number of unique preferences: 3
Ordering / indexing of elements is gone!
2] unique_prefs_morn[
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 unique_prefs_morn[2] TypeError: 'set' object is not subscriptable
But, supports set operators from math! →
Union (\(M \cup E\)):
= [
prefs_eve 'None','None','Vegan','None','None'
]= set(prefs_eve)
unique_prefs_eve = unique_prefs_morn.union(unique_prefs_eve)
unique_prefs_either print(f"Preferences in either event: {unique_prefs_either}")
Preferences in either event: {'Gluten Free', 'None', 'Vegan'}
Intersection (\(M \cap E\)):
= unique_prefs_morn.intersection(unique_prefs_eve)
unique_prefs_both print(f"Preferences in both events: {unique_prefs_both}")
Preferences in both events: {'None', 'Vegan'}
Set Difference (\(M \setminus E\) or \(E \setminus M\)):
= unique_prefs_morn - unique_prefs_eve
unique_prefs_mornonly print(f"Preferences only in the morning: {unique_prefs_mornonly}")
= unique_prefs_eve - unique_prefs_morn
unique_prefs_eveonly print(f"Preferences only in the evening: {unique_prefs_eveonly}")
Preferences only in the morning: {'Gluten Free'}
Preferences only in the evening: set()
While other language like Java have lots of fancy types of Map, Python has a single type, the dictionary:
= {
gtown_data 'name': 'Georgetown University',
'founded': 1789,
'coordinates': (38.9076, -77.0723),
'location': {
'city': 'Washington',
'state': 'DC', # <__<
'country': 'USA'
}
}print(gtown_data.keys())
print(gtown_data.values())
dict_keys(['name', 'founded', 'coordinates', 'location'])
dict_values(['Georgetown University', 1789, (38.9076, -77.0723), {'city': 'Washington', 'state': 'DC', 'country': 'USA'}])
Be careful when looping! Default behavior is iteration over keys:
for k in gtown_data:
print(k)
name
founded
coordinates
location
For key-value pairs use .items()
:
for k, v in gtown_data.items():
print(f"{k}: {v}")
name: Georgetown University
founded: 1789
coordinates: (38.9076, -77.0723)
location: {'city': 'Washington', 'state': 'DC', 'country': 'USA'}
Last week: analyzing complexity of an algorithm (no broader context): Insertion-Sort
This week: analyzing complexity of a structure as a collection of variables + algorithms
Example: tuple
N
, pre-allocated list
with N
slotsOOP: Design pattern for “organizing” data and algorithms into structures
class MyNTuple():
def __init__(self, N, item_list):
if len(item_list) != N:
raise Exception("Must have exactly N elements")
self.N = N
self.contents = item_list
def item_at_index(self, i):
return self.contents[i]
= MyNTuple(2, [-40, 40])
jeffs_location print(jeffs_location.item_at_index(0))
print(f"{jeffs_location} 🧐")
-40
<__main__.MyNTuple object at 0x16523c2d0> 🧐
class MyTuple:
def __init__(self, thing1, thing2):
self.thing1 = thing1
self.thing2 = thing2
def __repr__(self):
return f"({self.thing1}, {self.thing2})"
def __str__(self):
return self.__repr__()
= MyTuple('a','b')
t1 = MyTuple(111, 222)
t2 print(t1)
print(t2)
(a, b)
(111, 222)
The list itself just points to a root item:
class MyList:
def __init__(self):
self.root = None
def append(self, new_item):
if self.root is None:
self.root = MyListItem(new_item)
else:
self.root.append(new_item)
def __repr__(self):
return self.root.__repr__()
An item has contents, pointer to next item:
class MyListItem:
def __init__(self, content):
self.content = content
self.next = None
def append(self, new_item):
if self.next is None:
self.next = MyListItem(new_item)
else:
self.next.append(new_item)
def __repr__(self):
= self.content
my_content return my_content if self.next is None else f"{my_content}, {self.next.__repr__()}"
= MyList()
users 'Jeff')
users.append('Alma')
users.append('Bo')
users.append(print(users)
Jeff, Alma, Bo
MyTuple
?MyTuple
?MyList
?MyList
?With a MyTuple
:
t1.thing1
'a'
\(\implies\) 1 step
t1.thing2
'b'
\(\implies\) 1 step
With a MyList
:
print(users.root.content)
Jeff
\(\implies\) 1 step
= users.root
current_node while current_node.next is not None:
= current_node.next
current_node print(current_node.content)
Bo
\(\implies\) (3 steps)
…But why 3? How many steps if the list contained 5 elements? \(N\) elements?
Now rather than just printing, let’s pairwise concatenate:
= users.root
cur_pointer1 while cur_pointer1 is not None:
= users.root
cur_pointer2 while cur_pointer2 is not None:
print(cur_pointer1.content + cur_pointer2.content)
= cur_pointer2.next
cur_pointer2 = cur_pointer1.next cur_pointer1
JeffJeff
JeffAlma
JeffBo
AlmaJeff
AlmaAlma
AlmaBo
BoJeff
BoAlma
BoBo
How many steps did this take? How about for a list with \(5\) elements? \(N\) elements?
= []
printed_items = users.root
cur_pointer1 while cur_pointer1 is not None:
= users.root
cur_pointer2 while cur_pointer2 is not None:
print(cur_pointer1.content + cur_pointer2.content)
printed_items.append(cur_pointer1.content)
printed_items.append(cur_pointer2.content)= cur_pointer2.next
cur_pointer2 = cur_pointer1.next
cur_pointer1 = users.root
check_pointer while check_pointer is not None:
if check_pointer.content in printed_items:
print(f"Phew. {check_pointer.content} printed at least once.")
else:
print(f"Oh no! {check_pointer.content} was never printed!!!")
= check_pointer.next check_pointer
JeffJeff
JeffAlma
JeffBo
AlmaJeff
AlmaAlma
AlmaBo
BoJeff
BoAlma
BoBo
Phew. Jeff printed at least once.
Phew. Alma printed at least once.
Phew. Bo printed at least once.
Code | Cost | Times Run | |
---|---|---|---|
1 | for i = 1 to n-1: |
\(c_1\) | \(\widetilde{n}\) |
2 | key = A[i] |
\(c_2\) | \(\widetilde{n}\) |
3 | # Insert A[i] into sorted subarray A[0:i-1] |
\(0\) | \(\widetilde{n}\) |
4 | j = i - 1 |
\(c_4\) | \(\widetilde{n}\) |
5 | while j >= 0 and A[j] > key: |
\(c_5\) | \(\sum_{i=2}^n t_i\) |
6 | A[j + 1] = A[j] |
\(c_6\) | \(\sum_{i=2}^n(t_i - 1)\) |
7 | j = j - 1 |
\(c_7\) | \(\sum_{i=2}^n(t_i - 1)\) |
8 | A[j + 1] = key |
\(c_8\) | \(\widetilde{n}\) |
\[ T(n) = c_1\widetilde{n} + c_2\widetilde{n} + c_4\widetilde{n} + c_5\sum_{i=2}^nt_i + c_6\sum_{i=2}^n(t_i - 1) + c_7\sum_{i=2}^n(t_i-1) + c_8\widetilde{n} \]
\[ T(n) = c_1n + c_2\widetilde{n} + c_4\widetilde{n} + c_5{\color{orange}\boxed{\color{black}\sum_{i=2}^nt_i}} + c_6{\color{lightblue}\boxed{\color{black}\sum_{i=2}^n(t_i - 1)}} + c_7{\color{lightblue}\boxed{\color{black}\sum_{i=2}^n(t_i-1)}} + c_8\widetilde{n} \]
\[ \begin{align*} {\color{orange}\boxed{\color{black}\sum_{i=2}^ni}} &= \sum_{i=1}^ni - \sum_{i=1}^1i = \frac{n(n+1)}{2} - 1 \\ {\color{lightblue}\boxed{\color{black}\sum_{i=2}^n(i-1)}} &= \sum_{i=1}^{n-1}i = \frac{(n-1)(n-1+1)}{2} = \frac{n(n-1)}{2} \end{align*} \]
\[ \begin{align*} T(n) = &{\color{gray}\left(\frac{c_5}{2} + \frac{c_6}{2} + \frac{c_7}{2}\right)}{\color{green}n^2} + {\color{gray}\left(c_1 + c_2 + c_4 + \frac{c_5}{2} - \frac{c_6}{2} - \frac{c_7}{2} + c_8\right)}{\color{green}n^1} \\ \phantom{T(n) = }& - {\color{gray}(c_2 + c_4 + c_5 + c_8)}{\color{green}n^0} \end{align*} \]
\[ \begin{align*} T(n) = &\underbrace{{\color{gray}\left(\frac{c_5}{2} + \frac{c_6}{2} + \frac{c_7}{2}\right)}}_{\text{Constant}}\underbrace{\phantom{(}{\color{green}n^2}\phantom{)}}_{\text{Quadratic}} \\ \phantom{T(n) = }&+ \underbrace{{\color{gray}\left(c_1 + c_2 + c_4 + \frac{c_5}{2} - \frac{c_6}{2} - \frac{c_7}{2} + c_8\right)}}_{\text{Constant}}\underbrace{\phantom{(}{\color{green}n^1}\phantom{)}}_{\text{Linear}} \\ \phantom{T(n) = }& - \underbrace{{\color{gray}(c_2 + c_4 + c_5 + c_8)}}_{\text{Constant}}\underbrace{{\color{green}n^0}}_{\text{Constant}} \end{align*} \]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
= [np.power(10, k) for k in np.arange(1, 2.75, 0.25)]
n_vals = pd.DataFrame({'$n$': n_vals})
runtime_df '$n^2 + 50n$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 2) + 50*x)
runtime_df['$n^2 + 10000$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 2) + 10000)
runtime_df['$O(n)$'] = runtime_df['$n$'].copy()
runtime_df['$O(nlogn)$'] = runtime_df['$n$'].apply(lambda x: x * np.log(x))
runtime_df['$O(n^2)$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 2))
runtime_df['$O(n^2logn)$'] = runtime_df['$n$'].apply(lambda x: np.power(x,2) * np.log(x))
runtime_df['$O(n^3)$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 3))
runtime_df['$O(n^3logn)$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 3) * np.log(x))
runtime_df[# Get the max values, for labeling the ends of lines
= runtime_df.max().to_dict()
max_vals = runtime_df.melt(id_vars=['$n$'])
plot_df #print(plot_df)
= {col: '' if (col == '$n^2 + 50n$') or (col == '$n^2 + 10000$') else (2,1) for col in runtime_df.columns}
style_map = plt.subplots(figsize=(11,5))
fig, ax ='$n$', y='value', hue='variable', style='variable', dashes=style_map)
sns.lineplot(plot_df, x#plt.xscale('log')
'log')
plt.yscale(# extract the existing handles and labels
= ax.get_legend_handles_labels()
h, l # slice the appropriate section of l and h to include in the legend
0:2], l[0:2])
ax.legend(h[for label, val in max_vals.items():
if (label == '$n$') or (label == '$n^2 + 50n$') or (label == '$n^2 + 10000$'):
continue
if 'logn' in label:
= label.replace('logn', r'\log(n)')
label = max_vals['$n$'] + 2, y = val, s=label, va='center')
ax.text(x # Hide the right and top spines
'right', 'top']].set_visible(False)
ax.spines[[ plt.show()
= [np.power(10, k) for k in np.arange(1, 6, 0.5)]
n_vals = pd.DataFrame({'$n$': n_vals})
rt_const_df '$20n^2$'] = rt_const_df['$n$'].apply(lambda x: 20*np.power(x,2))
rt_const_df['$n^2$'] = rt_const_df['$n$'].apply(lambda x: np.power(x,2))
rt_const_df['$n^2logn$'] = rt_const_df['$n$'].apply(lambda x: np.power(x,2) * np.power(np.log(x),2))
rt_const_df['$n^3$'] = rt_const_df['$n$'].apply(lambda x: np.power(x,3))
rt_const_df[# Get the max values, for labeling the ends of lines
= rt_const_df.max().to_dict()
max_vals = rt_const_df.melt(id_vars=['$n$'])
plot_df_const = {col: '' if (col == '$20n^2$') else (2,1) for col in rt_const_df.columns}
style_map = plt.subplots(figsize=(11,5))
fig_const, ax_const ='$n$', y='value', hue='variable', style='variable', dashes=style_map)
sns.lineplot(plot_df_const, x'log')
plt.xscale('log')
plt.yscale(# extract the existing handles and labels
= ax_const.get_legend_handles_labels()
h_const, l_const # slice the appropriate section of l and h to include in the legend
0:1], l_const[0:1])
ax_const.legend(h_const[for label, val in max_vals.items():
if (label == '$n$') or (label == '$20n^2$'):
continue
if 'logn' in label:
= label.replace('logn', r'\log(n)')
label = max_vals['$n$'] + 10**4, y = val, s=label, va='center')
ax_const.text(x # Hide the right and top spines
'right', 'top']].set_visible(False)
ax_const.spines[[ plt.show()
Let \(f, g: \mathbb{N} \rightarrow \mathbb{N}\). Then we write \(f(n) = \overline{O}(g(n))\) when there exists a threshold \(n_0 > 0\) and constant \(K > 0\) s.t. \[ \forall n \geq n_0 \left[ f(n) \leq K\cdot g(n) \right] \]
In words:
Definition from Savage (1998, pg. 13)
How about Merge-Sort? \(T(n) = 2T(n/2) + \Theta(n)\)
\(W(n)\) and \(D(n)\) grow at same rate \(\implies\) Case 2:
Merge-Sort: \(k = 0\) works! \(\Theta(n^{\log_2(2)}\log_2^0(n)) = \Theta(n)\)
Thus \(T(n) = \Theta(n^{\log_b(a)}\log_2^{k+1}(n)) = \boxed{\Theta(n\log_2n)}\) 😎
Need to be careful with \(O(f(n))\) vs. \(\Omega(f(n))\) however: difference between “for all inputs” vs. “for some inputs”:
Master Theorem: Let \(a > 0\) and \(b > 1\) be constants, and let \(f(n)\) be a driving function defined and nonnegative on all sufficiently large reals. Define \(T(n)\) on \(n \in \mathbb{N}\) by
\[ T(n) = aT(n/b) + f(n) \]
where \(aT(n/b) = a'T(\lfloor n/b \rfloor) + a''T(\lceil n/b \rceil)\) for some \(a' \geq 0\) and \(a'' \geq 0\) satisfying \(a = a' + a''\). Then the asymptotic behavior of \(T(n)\) can be characterized as follows:
Proof. See Cormen et al. (2001), pg. 107-114.