The course focuses on Python, but part of understanding Python is understanding how Python does things differently from other languages!
Just as C was “overtaken” by Java, then Java was “overtaken” by Python, Python will someday be overtaken
The Numbers
Code
import pandas as pdimport numpy as npimport plotly.express as pximport plotly.io as piopio.renderers.default ="notebook"lang_df = pd.read_csv("assets/gh_issues.csv")# The data for 2022 is essentially uselesslang_df = lang_df[lang_df['year'] <=2021].copy()lang_df['time'] = lang_df['year'].astype(str) +"_"+ lang_df['quarter'].astype(str)lang_df['prop'] = lang_df['count'] / lang_df.groupby('time')['count'].transform('sum')lang_df.head()#sns.lineplot(data=lang_df, x='year', y='count', color='name')# Keep only most popular languageskeep_langs = ['Python','JavaScript','C','C++','C#','Java','Ruby']pop_df = lang_df[lang_df['name'].isin(keep_langs)].copy()fig = px.line(pop_df, x='time', y='prop', color='name', template='simple_white', title='Programming Language Popularity Since 2012', labels = {'time': 'Year','prop': 'Proportion of GitHub Issues' })fig.update_layout( xaxis =dict( tickmode ='array', tickvals = [f"{year}_1"for year inrange(2012,2022)], ticktext = [f"{year}"for year inrange(2012,2022)] ))fig.show()
/Users/jpj/.pyenv/versions/3.11.5/lib/python3.11/site-packages/plotly/express/_core.py:2065: FutureWarning:
When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
Research Particular Subfields!
For example, if you’re interested in pursuing Economics, you’ll want to learn Stata
Physics? You may want to learn MATLAB
For pure mathematics: Julia / Mathematica
Statistics, sociology, psychology, political science: R
Web development: JavaScript / TypeScript
The holy grail: you’re comfortable with Python but can also think in general, language-agnostic terms about algorithmic and data-structural efficiency!
Avoid Analysis Paralysis
(Easier said than done, admittedly…)
Tie Yourself to the Mast
The exhausted 3am version of you will thank present you for writing useful comments, exceptions, and type hints!
Adversarial development: brainstorm all of the ways an evil hacker might break your code!
Example: Finding An Item Within A List
Seems straightforward, right? Given a list l, and a value v, return the index of l which contains v
Corner cases galore…
What if l contains v more than once? What if it doesn’t contain v at all? What if l is None? What if v is None? What if l isn’t a list at all? What if v is itself a list?
from mypy import apiresult = api.run(['-c',_i])print(result[0])
<string>:3: error: Unsupported left operand type for * ("Number") [operator]
<string>:4: error: Argument 1 to "multiply" has incompatible type "int"; expected "Number" [arg-type]
<string>:4: note: Types from "numbers" aren't supported for static type checking
<string>:4: note: See https://peps.python.org/pep-0484/#the-numeric-tower
<string>:4: note: Consider using a protocol instead, such as typing.SupportsFloat
<string>:4: error: Argument 2 to "multiply" has incompatible type "int"; expected "Number" [arg-type]
<string>:5: error: Argument 1 to "multiply" has incompatible type "str"; expected "Number" [arg-type]
<string>:5: note: Types from "numbers" aren't supported for static type checking
<string>:5: note: See https://peps.python.org/pep-0484/#the-numeric-tower
<string>:5: note: Consider using a protocol instead, such as typing.SupportsFloat
<string>:5: error: Argument 2 to "multiply" has incompatible type "int"; expected "Number" [arg-type]
Found 5 errors in 1 file (checked 1 source file)
Unit Testing
For your life: test-driven development
If you’re coding a duck, you should test that it looks like a duck, quacks like a duck, etc.
For this class:
Public tests: Fully visible, see result plus full code
Hidden tests: See result + description of test, but no code
Secret tests: We run these after you submit, as a major portion of the total grade
Data Structures: Motivation
Why Does The NYC Subway Have Express Lines?
Why Stop At Two Levels?
How TF Does Google Maps Work?
A (mostly) full-on answer: soon to come! Data structures for spatial data
Are we trapped? Like… what is there to do besides performing these \(N^3\) operations, if we want to multiply two \(N \times N\) matrices? Why are we about to move onto yet another slide about this?
Block-Partitioning Matrices
Now let’s consider big matrices, whose dimensions are a power of 2 (for ease of illustration): \(A\) and \(B\) are now \(N \times N = 2^n \times 2^n\) matrices
Which gives us a recurrence relation representing the total number of computations required for this big-matrix multiplication: \(T(N) = \underbrace{8T(N/2)}_{\text{Multiplications}} + \underbrace{\Theta(1)}_{\text{Additions}}\)
It turns out that (using a method we’ll learn in Week 3), given this recurrence relation and our base case from the previous slide, this divide-and-conquer approach via block-partitioning doesn’t help us: we still get \(T(n) = O(n^3)\)…
So why is Jeff still torturing us with this example?
Using the previous slide as our base case and applying this same method to the block-paritioned big matrices, we get the same result, but where the four entries in \(AB\) here are now matrices rather than scalars:
We now have a different recurrence relation: \(T(N) = \underbrace{7T(N/2)}_{\text{Multiplications}} + \underbrace{\Theta(N^2)}_{\text{Additions}}\)
And it turns out, somewhat miraculously, that the additional time required for the increased number of additions is significantly less than the time savings we obtain by doing 7 instead of 8 multiplications, since this method now runs in \(T(N) = O(N^{\log_2(7)}) \approx O(N^{2.807}) < O(N^3)\) 🤯