
DSAN 5000: Data Science and Analytics
Thursday, September 12, 2024
Today’s Planned Schedule (Section 03):
| Start | End | Topic | |
|---|---|---|---|
| Lecture | 3:30pm | 4:00pm | How the Internet Works → |
| 4:00pm | 4:30pm | Quarto and Reproducible Research → | |
| 4:30pm | 5:00pm | Git and GitHub → | |
| Break! | 5:00pm | 5:10pm | |
| Lab | 5:10pm | 5:50pm | Lab Demonstrations → |
| 5:50pm | 6:00pm | Lab Assignment Overview → |


141.161.0.0 and 141.161.255.255google.com, georgetown.edu, etc.
georgetown.edu, for example, is really 23.185.0.21GET, POST, HEAD; for now, we focus on the GET request, the request your browser makes by default<html> box contains <head> (metadata, e.g., page title) and <body> (page content) boxes, <body> box contains e.g. header, footer, navigation bar, and main content of page.<a>): <a href="https://npr.org/">news</a> creates a link, so when you click “news”, browser will request (fetch the HTML for) the URL https://npr.orgImage from Menczer, Fortunato, and Davis (2020, 90)
hello_server.py


GET requests

ssh command!cp, for files on your local computerscp (Secure Copy Protocol), which you can use to copy files to/from remote servers to your local computerrsyncscp, with same syntax, except it synchronizes (only copies files which are different or missing)-a (“archive”) tells rsync you want it to copy recursively-v (“verbose”) tells rsync to print information as it copies-z (“zip/compress”) tells rsync to compress files before copying and then decompress them on the server (thus massively speeding up the transfer)John William Waterhouse, Ulysses and the Sirens, Public domain, via Wikimedia Commons
More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. Those are some of the telling figures that emerged from Nature’s survey of 1,576 researchers (Baker 2016)
.qmd chunks)Rscript <script>.r .ipynb cells)python <script>.py(Important distinction!)
git init in shell to creategit add to track filesgit commit to commit changes to tracked filesgit push/git pull: The link between the two!
cool-project, and initialize a Git repo for ituser@hostname:~$ mkdir cool-project
user@hostname:~$ cd cool-project
user@hostname:~/cool-project$ git init
Initialized empty Git repository in /home/user/cool-project/.git/.git, in the directory:user@hostname:~/cool-project$ ls -lah
total 12K
drwxr-xr-x 3 user user 4.0K May 28 00:53 .
drwxr-xr-x 12 user user 4.0K May 28 00:53 ..
drwxr-xr-x 7 user user 4.0K May 28 00:53 .gitThe Git Side: Local I
We’re writing Python code, so let’s create and track cool_code.py:
user@hostname:~/cool-project$ touch cool_code.py
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: cool_code.py
user@hostname:~/cool-project$ git commit -m "Initial version of cool_code.py"
[main (root-commit) b40dc25] Initial version of cool_code.py
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 cool_code.pygit log:user@hostname:~/cool-project$ git log
commit b40dc252a3b7355cc4c28397fefe7911ff3c94b9 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:57:16 2023 +0000
Initial version of cool_code.pygitGraph commit id: "b40dc25"
user@hostname:~/cool-project$ git status
On branch main
nothing to commit, working tree clean
user@hostname:~/cool-project$ echo "1 + 1" >> cool_code.py
user@hostname:~/cool-project$ more cool_code.py
1 + 1
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: cool_code.py
user@hostname:~/cool-project$ git commit -m "Added code to cool_code.py"
[main e3bc497] Added code to cool_code.py
1 file changed, 1 insertion(+)The git log will show the new version:
user@hostname:~/cool-project$ git log
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial version of cool_code.py
user@hostname:~/cool-project$ echo "2 + 2" >> cool_code.py
user@hostname:~/cool-project$ more cool_code.py
1 + 1
2 + 2
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: cool_code.py
user@hostname:~/cool-project$ git commit -m "Second version of cool_code.py"
[main 4007db9] Second version of cool_code.py
1 file changed, 1 insertion(+)git loguser@hostname:~/cool-project$ git log
commit 4007db9a031ca134fe09eab840b2bc845366a9c1 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:39:28 2023 +0000
Second version of cool_code.py
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial (empty) version of cool_code.pyFirst check the git log to find the hash for the commit you want to revert back to:
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.pyuser@hostname:~/cool-project$ git reset --hard e3bc497ac
HEAD is now at e3bc497 Added code to cool_code.py
user@hostname:~/cool-project$ git log
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial (empty) version of cool_code.pyuser@hostname:~/cool-project$ echo "3 + 3" >> cool_code.py
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: cool_code.py
user@hostname:~/cool-project$ git commit -m "Added different code to cool_code.py"
[main 700d955] Added different code to cool_code.py
1 file changed, 1 insertion(+)The final git log:
user@hostname:~/cool-project$ git log
commit 700d955faacb27d7b8bc464b9451851b5e319f20 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:44:49 2023 +0000
Added different code to cool_code.py
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial (empty) version of cool_code.pyEven the simplest projects can start to look like:
gitGraph
commit id: "537dd67"
commit id: "6639143"
branch nice_feature
checkout nice_feature
commit id: "937ded8"
checkout main
commit id: "9e6679c"
checkout nice_feature
branch very_nice_feature
checkout very_nice_feature
commit id: "7f4de03"
checkout main
commit id: "6df80c1"
checkout nice_feature
commit id: "bd0ebb8"
checkout main
merge nice_feature id: "9ff61cc" tag: "V 1.0.0" type: HIGHLIGHT
checkout very_nice_feature
commit id: "370613b"
checkout main
commit id: "9a07a97"
git pushdiff| Frontend |
Backend |
|
|---|---|---|
| Low Level | HTML/CSS/JavaScript | GitHub Pages |
| Middle Level | JS Libraries | PHP, SQL |
| High Level | React, Next.js | Node.js, Vercel |
index.html
git init, git add -A ., git push
<username>.github.io!

sshscprsync5000-lab-1.2git clone.bib file to add citationsindex.qmdabout.ipynbslides/slides.ipynb using the revealjs formatquarto renderrsync or scp to copy the _site directory to your GU domains server (within ~/public_html).bib fileDSAN 5000 W03: Data Science Workflow