Code
= (38.9076, -77.0723)
gtown_loc gtown_loc
(38.9076, -77.0723)
DSAN 5500: Data Structures, Objects, and Algorithms in Python
Today’s Planned Schedule:
Start | End | Topic | |
---|---|---|---|
Lecture | 6:30pm | 6:40pm | TA Intros! |
6:40pm | 7:00pm | Python’s Built-In Data Structures → | |
7:00pm | 7:30pm | Big-O Notation and Complexity Classes | |
7:30pm | 8:00pm | Formal Complexity Analysis: Insertion-Sort → | |
Break! | 8:00pm | 8:10pm | |
8:10pm | 8:30pm | Formal Complexity Analysis: Merge-Sort → | |
8:30pm | 9:00pm | Object-Oriented Programming → |
jrd154@georgetown.edu
jl3081@georgetown.edu
bool
int
float
None
Fixed-size collection of \(N\) objects
Unless otherwise specified, we’re talking about \(2\)-tuples
Example: We can locate something on the Earth by specifying two floats
: latitude and longitude!
= (38.9076, -77.0723)
gtown_loc gtown_loc
(38.9076, -77.0723)
But what if we don’t know in advance how many items we want to store? Ex: how can we store users for a new app?
x = ['a','b','c']
x[0] = 'a'
x[1] = 'b'
x[2] = 'c'
list
List<String> myList = Arrays.asList("a", "b", "c");
for (int i = 0; i < x.size(); i++) {
System.out.println(myList.get(i));
}
a b c
= ['a','b','c']
my_list for list_element in my_list:
print(list_element)
a
b
c
Construct new list by applying operation to each element:
= [4,5,6,7]
my_nums = [num ** 2 for num in my_nums]
my_squares my_squares
[16, 25, 36, 49]
Can also filter the elements of the list with if
:
= [num ** 2 for num in my_nums if num % 2 == 1]
my_odd_squares my_odd_squares
[25, 49]
Efficient for finding unique elements:
= [
prefs_morn 'None','None','None','Vegan','Vegan','None',
'Gluten Free'
]print(f"Number of responses: {len(prefs_morn)}")
Number of responses: 7
= set(prefs_morn)
unique_prefs_morn print(f"Unique preferences: {unique_prefs_morn}")
print(f"Number of unique preferences: {len(unique_prefs_morn)}")
Unique preferences: {'Vegan', 'Gluten Free', 'None'}
Number of unique preferences: 3
Ordering / indexing of elements is gone!
2] unique_prefs_morn[
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 unique_prefs_morn[2] TypeError: 'set' object is not subscriptable
But, supports set operators from math! →
Union (\(M \cup E\)):
= [
prefs_eve 'None','None','Vegan','None','None'
]= set(prefs_eve)
unique_prefs_eve = unique_prefs_morn.union(unique_prefs_eve)
unique_prefs_either print(f"Preferences in either event: {unique_prefs_either}")
Preferences in either event: {'None', 'Gluten Free', 'Vegan'}
Intersection (\(M \cap E\)):
= unique_prefs_morn.intersection(unique_prefs_eve)
unique_prefs_both print(f"Preferences in both events: {unique_prefs_both}")
Preferences in both events: {'None', 'Vegan'}
Set Difference (\(M \setminus E\) or \(E \setminus M\)):
= unique_prefs_morn - unique_prefs_eve
unique_prefs_mornonly print(f"Preferences only in the morning: {unique_prefs_mornonly}")
= unique_prefs_eve - unique_prefs_morn
unique_prefs_eveonly print(f"Preferences only in the evening: {unique_prefs_eveonly}")
Preferences only in the morning: {'Gluten Free'}
Preferences only in the evening: set()
While other language like Java have lots of fancy types of Map, Python has a single type, the dictionary:
= {
gtown_data 'name': 'Georgetown University',
'founded': 1789,
'coordinates': (38.9076, -77.0723),
'location': {
'city': 'Washington',
'state': 'DC', # <__<
'country': 'USA'
}
}print(gtown_data.keys())
print(gtown_data.values())
dict_keys(['name', 'founded', 'coordinates', 'location'])
dict_values(['Georgetown University', 1789, (38.9076, -77.0723), {'city': 'Washington', 'state': 'DC', 'country': 'USA'}])
Be careful when looping! Default behavior is iteration over keys:
for k in gtown_data:
print(k)
name
founded
coordinates
location
For key-value pairs use .items()
:
for k, v in gtown_data.items():
print(f"{k}: {v}")
name: Georgetown University
founded: 1789
coordinates: (38.9076, -77.0723)
location: {'city': 'Washington', 'state': 'DC', 'country': 'USA'}
Last week: analyzing complexity of an algorithm (no broader context): Insertion-Sort
This week: analyzing complexity of a structure as a collection of variables + algorithms
Example: tuple
N
, pre-allocated list
with N
slotsOOP: Design pattern for “organizing” data and algorithms into structures
class MyNTuple():
def __init__(self, N, item_list):
if len(item_list) != N:
raise Exception("Must have exactly N elements")
self.N = N
self.contents = item_list
def item_at_index(self, i):
return self.contents[i]
= MyNTuple(2, [-40, 40])
jeffs_location print(jeffs_location.item_at_index(0))
print(f"{jeffs_location} 🧐")
-40
<__main__.MyNTuple object at 0x12183ed10> 🧐
class MyTuple:
def __init__(self, thing1, thing2):
self.thing1 = thing1
self.thing2 = thing2
def __repr__(self):
return f"({self.thing1}, {self.thing2})"
def __str__(self):
return self.__repr__()
= MyTuple('a','b')
t1 = MyTuple(111, 222)
t2 print(t1)
print(t2)
(a, b)
(111, 222)
The list itself just points to a root item:
class MyList:
def __init__(self):
self.root = None
def append(self, new_item):
if self.root is None:
self.root = MyListItem(new_item)
else:
self.root.append(new_item)
def __repr__(self):
return self.root.__repr__()
An item has contents, pointer to next item:
class MyListItem:
def __init__(self, content):
self.content = content
self.next = None
def append(self, new_item):
if self.next is None:
self.next = MyListItem(new_item)
else:
self.next.append(new_item)
def __repr__(self):
= self.content
my_content return my_content if self.next is None else f"{my_content}, {self.next.__repr__()}"
= MyList()
users 'Jeff')
users.append('Alma')
users.append('Bo')
users.append(print(users)
Jeff, Alma, Bo
MyTuple
?MyTuple
?MyList
?MyList
?With a MyTuple
:
t1.thing1
'a'
\(\implies\) 1 step
t1.thing2
'b'
\(\implies\) 1 step
With a MyList
:
print(users.root.content)
Jeff
\(\implies\) 1 step
= users.root
current_node while current_node.next is not None:
= current_node.next
current_node print(current_node.content)
Bo
\(\implies\) (3 steps)
…But why 3? How many steps if the list contained 5 elements? \(N\) elements?
Now rather than just printing, let’s pairwise concatenate:
= users.root
cur_pointer1 while cur_pointer1 is not None:
= users.root
cur_pointer2 while cur_pointer2 is not None:
print(cur_pointer1.content + cur_pointer2.content)
= cur_pointer2.next
cur_pointer2 = cur_pointer1.next cur_pointer1
JeffJeff
JeffAlma
JeffBo
AlmaJeff
AlmaAlma
AlmaBo
BoJeff
BoAlma
BoBo
How many steps did this take? How about for a list with \(5\) elements? \(N\) elements?
= []
printed_items = users.root
cur_pointer1 while cur_pointer1 is not None:
= users.root
cur_pointer2 while cur_pointer2 is not None:
print(cur_pointer1.content + cur_pointer2.content)
printed_items.append(cur_pointer1.content)
printed_items.append(cur_pointer2.content)= cur_pointer2.next
cur_pointer2 = cur_pointer1.next
cur_pointer1 = users.root
check_pointer while check_pointer is not None:
if check_pointer.content in printed_items:
print(f"Phew. {check_pointer.content} printed at least once.")
else:
print(f"Oh no! {check_pointer.content} was never printed!!!")
= check_pointer.next check_pointer
JeffJeff
JeffAlma
JeffBo
AlmaJeff
AlmaAlma
AlmaBo
BoJeff
BoAlma
BoBo
Phew. Jeff printed at least once.
Phew. Alma printed at least once.
Phew. Bo printed at least once.
Code | Cost | Times Run | |
---|---|---|---|
1 | for i = 1 to n-1: |
\(c_1\) | \(\widetilde{n}\) |
2 | key = A[i] |
\(c_2\) | \(\widetilde{n}\) |
3 | # Insert A[i] into sorted subarray A[0:i-1] |
\(0\) | \(\widetilde{n}\) |
4 | j = i - 1 |
\(c_4\) | \(\widetilde{n}\) |
5 | while j >= 0 and A[j] > key: |
\(c_5\) | \(\sum_{i=2}^n t_i\) |
6 | A[j + 1] = A[j] |
\(c_6\) | \(\sum_{i=2}^n(t_i - 1)\) |
7 | j = j - 1 |
\(c_7\) | \(\sum_{i=2}^n(t_i - 1)\) |
8 | A[j + 1] = key |
\(c_8\) | \(\widetilde{n}\) |
\[ T(n) = c_1\widetilde{n} + c_2\widetilde{n} + c_4\widetilde{n} + c_5\sum_{i=2}^nt_i + c_6\sum_{i=2}^n(t_i - 1) + c_7\sum_{i=2}^n(t_i-1) + c_8\widetilde{n} \]
\[ T(n) = c_1n + c_2\widetilde{n} + c_4\widetilde{n} + c_5{\color{orange}\boxed{\color{black}\sum_{i=2}^nt_i}} + c_6{\color{lightblue}\boxed{\color{black}\sum_{i=2}^n(t_i - 1)}} + c_7{\color{lightblue}\boxed{\color{black}\sum_{i=2}^n(t_i-1)}} + c_8\widetilde{n} \]
\[ \begin{align*} {\color{orange}\boxed{\color{black}\sum_{i=2}^ni}} &= \sum_{i=1}^ni - \sum_{i=1}^1i = \frac{n(n+1)}{2} - 1 \\ {\color{lightblue}\boxed{\color{black}\sum_{i=2}^n(i-1)}} &= \sum_{i=1}^{n-1}i = \frac{(n-1)(n-1+1)}{2} = \frac{n(n-1)}{2} \end{align*} \]
\[ \begin{align*} T(n) = &{\color{gray}\left(\frac{c_5}{2} + \frac{c_6}{2} + \frac{c_7}{2}\right)}{\color{green}n^2} + {\color{gray}\left(c_1 + c_2 + c_4 + \frac{c_5}{2} - \frac{c_6}{2} - \frac{c_7}{2} + c_8\right)}{\color{green}n^1} \\ \phantom{T(n) = }& - {\color{gray}(c_2 + c_4 + c_5 + c_8)}{\color{green}n^0} \end{align*} \]
\[ \begin{align*} T(n) = &\underbrace{{\color{gray}\left(\frac{c_5}{2} + \frac{c_6}{2} + \frac{c_7}{2}\right)}}_{\text{Constant}}\underbrace{\phantom{(}{\color{green}n^2}\phantom{)}}_{\text{Quadratic}} \\ \phantom{T(n) = }&+ \underbrace{{\color{gray}\left(c_1 + c_2 + c_4 + \frac{c_5}{2} - \frac{c_6}{2} - \frac{c_7}{2} + c_8\right)}}_{\text{Constant}}\underbrace{\phantom{(}{\color{green}n^1}\phantom{)}}_{\text{Linear}} \\ \phantom{T(n) = }& - \underbrace{{\color{gray}(c_2 + c_4 + c_5 + c_8)}}_{\text{Constant}}\underbrace{{\color{green}n^0}}_{\text{Constant}} \end{align*} \]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
= [np.power(10, k) for k in np.arange(1, 2.75, 0.25)]
n_vals = pd.DataFrame({'$n$': n_vals})
runtime_df '$n^2 + 50n$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 2) + 50*x)
runtime_df['$n^2 + 10000$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 2) + 10000)
runtime_df['$O(n)$'] = runtime_df['$n$'].copy()
runtime_df['$O(nlogn)$'] = runtime_df['$n$'].apply(lambda x: x * np.log(x))
runtime_df['$O(n^2)$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 2))
runtime_df['$O(n^2logn)$'] = runtime_df['$n$'].apply(lambda x: np.power(x,2) * np.log(x))
runtime_df['$O(n^3)$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 3))
runtime_df['$O(n^3logn)$'] = runtime_df['$n$'].apply(lambda x: np.power(x, 3) * np.log(x))
runtime_df[# Get the max values, for labeling the ends of lines
= runtime_df.max().to_dict()
max_vals = runtime_df.melt(id_vars=['$n$'])
plot_df #print(plot_df)
= {col: '' if (col == '$n^2 + 50n$') or (col == '$n^2 + 10000$') else (2,1) for col in runtime_df.columns}
style_map = plt.subplots(figsize=(11,5))
fig, ax ='$n$', y='value', hue='variable', style='variable', dashes=style_map)
sns.lineplot(plot_df, x#plt.xscale('log')
'log')
plt.yscale(# extract the existing handles and labels
= ax.get_legend_handles_labels()
h, l # slice the appropriate section of l and h to include in the legend
0:2], l[0:2])
ax.legend(h[for label, val in max_vals.items():
if (label == '$n$') or (label == '$n^2 + 50n$') or (label == '$n^2 + 10000$'):
continue
if 'logn' in label:
= label.replace('logn', r'\log(n)')
label = max_vals['$n$'] + 2, y = val, s=label, va='center')
ax.text(x # Hide the right and top spines
'right', 'top']].set_visible(False)
ax.spines[[ plt.show()
= [np.power(10, k) for k in np.arange(1, 6, 0.5)]
n_vals = pd.DataFrame({'$n$': n_vals})
rt_const_df '$20n^2$'] = rt_const_df['$n$'].apply(lambda x: 20*np.power(x,2))
rt_const_df['$n^2$'] = rt_const_df['$n$'].apply(lambda x: np.power(x,2))
rt_const_df['$n^2logn$'] = rt_const_df['$n$'].apply(lambda x: np.power(x,2) * np.power(np.log(x),2))
rt_const_df['$n^3$'] = rt_const_df['$n$'].apply(lambda x: np.power(x,3))
rt_const_df[# Get the max values, for labeling the ends of lines
= rt_const_df.max().to_dict()
max_vals = rt_const_df.melt(id_vars=['$n$'])
plot_df_const = {col: '' if (col == '$20n^2$') else (2,1) for col in rt_const_df.columns}
style_map = plt.subplots(figsize=(11,5))
fig_const, ax_const ='$n$', y='value', hue='variable', style='variable', dashes=style_map)
sns.lineplot(plot_df_const, x'log')
plt.xscale('log')
plt.yscale(# extract the existing handles and labels
= ax_const.get_legend_handles_labels()
h_const, l_const # slice the appropriate section of l and h to include in the legend
0:1], l_const[0:1])
ax_const.legend(h_const[for label, val in max_vals.items():
if (label == '$n$') or (label == '$20n^2$'):
continue
if 'logn' in label:
= label.replace('logn', r'\log(n)')
label = max_vals['$n$'] + 10**4, y = val, s=label, va='center')
ax_const.text(x # Hide the right and top spines
'right', 'top']].set_visible(False)
ax_const.spines[[ plt.show()
Let \(f, g: \mathbb{N} \rightarrow \mathbb{N}\). Then we write \(f(n) = \overline{O}(g(n))\) when there exists a threshold \(n_0 > 0\) and constant \(K > 0\) s.t. \[ \forall n \geq n_0 \left[ f(n) \leq K\cdot g(n) \right] \]
In words:
Definition from Savage (1998, pg. 13)
How about Merge-Sort? \(T(n) = 2T(n/2) + \Theta(n)\)
\(W(n)\) and \(D(n)\) grow at same rate \(\implies\) Case 2:
Merge-Sort: \(k = 0\) works! \(\Theta(n^{\log_2(2)}\log_2^0(n)) = \Theta(n)\)
Thus \(T(n) = \Theta(n^{\log_b(a)}\log_2^{k+1}(n)) = \boxed{\Theta(n\log_2n)}\) 😎
.py
files may be easier than .ipynb
for development!)Screen
class, another person can write the Button
class, and so onScreen
team doesn’t need to know the internal details of Button
(just it’s API), and vice-versapublic_var
_protected_var
__private_var
class MyTopSecretInfo:
= "I love Carly Rae Jepsen"
__the_info
= MyTopSecretInfo()
info_obj info_obj.__the_info
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[27], line 5 2 __the_info = "I love Carly Rae Jepsen" 4 info_obj = MyTopSecretInfo() ----> 5 info_obj.__the_info AttributeError: 'MyTopSecretInfo' object has no attribute '__the_info'
Guess we can’t access it then, right? 😮💨
info_obj._MyTopSecretInfo__the_info
'I love Carly Rae Jepsen'
print()
?print()
(through __str__()
) takes on many (!) forms (!): each type of object has its own implementation of __str__()
class Bookstore:
def __init__(self, name, location):
self.name = name
self.location = location
self.books = []
def __getitem__(self, index):
return self.books[index]
def __repr__(self):
return self.__str__()
def __str__(self):
return f"Bookstore[{self.get_num_books()} books]"
def add_books(self, book_list):
self.books.extend(book_list)
def get_books(self):
return self.books
def get_inventory(self):
= []
book_lines for book_index, book in enumerate(self.get_books()):
= f"{book_index}. {str(book)}"
cur_book_line
book_lines.append(cur_book_line)return "\n".join(book_lines)
def get_num_books(self):
return len(self.get_books())
def sort_books(self, sort_key):
self.books.sort(key=sort_key)
class Book:
def __init__(self, title, authors, num_pages):
self.title = title
self.authors = authors
self.num_pages = num_pages
def __str__(self):
return f"Book[title={self.get_title()}, authors={self.get_authors()}, pages={self.get_num_pages()}]"
def get_authors(self):
return self.authors
def get_first_author(self):
return self.authors[0]
def get_num_pages(self):
return self.num_pages
def get_title(self):
return self.title
class Person:
def __init__(self, family_name, given_name):
self.family_name = family_name
self.given_name = given_name
def __repr__(self):
return self.__str__()
def __str__(self):
return f"Person[{self.get_family_name()}, {self.get_given_name()}]"
def get_family_name(self):
return self.family_name
def get_given_name(self):
return self.given_name
= Bookstore("Bookland", "Washington, DC")
my_bookstore = Person("Plath", "Sylvia")
plath = Book("The Bell Jar", [plath], 244)
bell_jar = Person("Marx", "Karl")
marx = Person("Engels", "Friedrich")
engels = Book("The Communist Manifesto", [marx, engels], 43)
manifesto = Person("Elster", "Jon")
elster = Book("The Cement of Society", [elster], 311)
cement
my_bookstore.add_books([bell_jar, manifesto, cement])print(my_bookstore)
print(my_bookstore[0])
print("Inventory:")
print(my_bookstore.get_inventory())
Bookstore[3 books]
Book[title=The Bell Jar, authors=[Person[Plath, Sylvia]], pages=244]
Inventory:
0. Book[title=The Bell Jar, authors=[Person[Plath, Sylvia]], pages=244]
1. Book[title=The Communist Manifesto, authors=[Person[Marx, Karl], Person[Engels, Friedrich]], pages=43]
2. Book[title=The Cement of Society, authors=[Person[Elster, Jon]], pages=311]
= lambda x: x.get_first_author().get_family_name()
sort_alpha = sort_alpha)
my_bookstore.sort_books(sort_key print(my_bookstore.get_inventory())
0. Book[title=The Cement of Society, authors=[Person[Elster, Jon]], pages=311]
1. Book[title=The Communist Manifesto, authors=[Person[Marx, Karl], Person[Engels, Friedrich]], pages=43]
2. Book[title=The Bell Jar, authors=[Person[Plath, Sylvia]], pages=244]
= lambda x: x.get_num_pages()
sort_pages = sort_pages)
my_bookstore.sort_books(sort_key print(my_bookstore.get_inventory())
0. Book[title=The Communist Manifesto, authors=[Person[Marx, Karl], Person[Engels, Friedrich]], pages=43]
1. Book[title=The Bell Jar, authors=[Person[Plath, Sylvia]], pages=244]
2. Book[title=The Cement of Society, authors=[Person[Elster, Jon]], pages=311]
Book
from enum import Enum
class BookType(Enum):
= 0
NONFICTION = 1
FICTION
class Book:
def __init__(self, title: str, authors: list[Person], num_pages: int, type: BookType):
self.title = title
self.authors = authors
self.num_pages = num_pages
self.type = type
def __str__(self):
return f"Book[title={self.title}, authors={self.authors}, pages={self.num_pages}, type={self.type}]"
= Person("Joyce", "James")
joyce = Book("Ulysses", [joyce], 732, BookType.FICTION)
ulysses = Person("Schelling", "Thomas")
schelling = Book("Micromotives and Macrobehavior", [schelling], 252, BookType.NONFICTION)
micromotives print(ulysses)
print(micromotives)
Book[title=Ulysses, authors=[Person[Joyce, James]], pages=732, type=BookType.FICTION]
Book[title=Micromotives and Macrobehavior, authors=[Person[Schelling, Thomas]], pages=252, type=BookType.NONFICTION]
# class Book defined as earlier
class FictionBook(Book):
def __init__(self, title, authors, num_pages, characters):
super().__init__(title, authors, num_pages)
self.characters = characters
class NonfictionBook(Book):
def __init__(self, title, authors, num_pages, topic):
super().__init__(title, authors, num_pages)
self.topic = topic
= Person("Joyce", "James")
joyce = FictionBook("Ulysses", [joyce], 732, ["Daedalus"])
ulysses = Person("Schelling", "Thomas")
schelling = NonfictionBook("Micromotives and Macrobehavior", [schelling], 252, "Economics")
micromotives print(ulysses)
print(micromotives)
Book[title=Ulysses, authors=[Person[Joyce, James]], pages=732]
Book[title=Micromotives and Macrobehavior, authors=[Person[Schelling, Thomas]], pages=252]
Need to be careful with \(O(f(n))\) vs. \(\Omega(f(n))\) however: difference between “for all inputs” vs. “for some inputs”:
Master Theorem: Let \(a > 0\) and \(b > 1\) be constants, and let \(f(n)\) be a driving function defined and nonnegative on all sufficiently large reals. Define \(T(n)\) on \(n \in \mathbb{N}\) by
\[ T(n) = aT(n/b) + f(n) \]
where \(aT(n/b) = a'T(\lfloor n/b \rfloor) + a''T(\lceil n/b \rceil)\) for some \(a' \geq 0\) and \(a'' \geq 0\) satisfying \(a = a' + a''\). Then the asymptotic behavior of \(T(n)\) can be characterized as follows:
Proof. See Cormen et al. (2001), pg. 107-114.