DSAN 5000: Data Science and Analytics
Thursday, September 12, 2024
Today’s Planned Schedule (Section 03):
Start | End | Topic | |
---|---|---|---|
Lecture | 3:30pm | 4:00pm | How the Internet Works → |
4:00pm | 4:30pm | Quarto and Reproducible Research → | |
4:30pm | 5:00pm | Git and GitHub → | |
Break! | 5:00pm | 5:10pm | |
Lab | 5:10pm | 5:50pm | Lab Demonstrations → |
5:50pm | 6:00pm | Lab Assignment Overview → |
141.161.0.0
and 141.161.255.255
google.com
, georgetown.edu
, etc.
georgetown.edu
, for example, is really 23.185.0.2
1GET
, POST
, HEAD
; for now, we focus on the GET
request, the request your browser makes by default<html>
box contains <head>
(metadata, e.g., page title) and <body>
(page content) boxes, <body>
box contains e.g. header, footer, navigation bar, and main content of page.<a>
): <a href="https://npr.org/">news</a>
creates a link, so when you click “news”, browser will request (fetch the HTML for) the URL https://npr.org
ssh
command!cp
, for files on your local computerscp
(Secure Copy Protocol), which you can use to copy files to/from remote servers to your local computerrsync
scp
, with same syntax, except it synchronizes (only copies files which are different or missing)-a
(“archive”) tells rsync you want it to copy recursively-v
(“verbose”) tells rsync to print information as it copies-z
(“zip/compress”) tells rsync to compress files before copying and then decompress them on the server (thus massively speeding up the transfer)More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. Those are some of the telling figures that emerged from Nature’s survey of 1,576 researchers (Baker 2016)
.qmd
chunks)Rscript <script>.r
.ipynb
cells)python <script>.py
(Important distinction!)
git init
in shell to creategit add
to track filesgit commit
to commit changes to tracked filesgit push
/git pull
: The link between the two!
cool-project
, and initialize a Git repo for ituser@hostname:~$ mkdir cool-project
user@hostname:~$ cd cool-project
user@hostname:~/cool-project$ git init
Initialized empty Git repository in /home/user/cool-project/.git/
.git
, in the directory:user@hostname:~/cool-project$ ls -lah
total 12K
drwxr-xr-x 3 user user 4.0K May 28 00:53 .
drwxr-xr-x 12 user user 4.0K May 28 00:53 ..
drwxr-xr-x 7 user user 4.0K May 28 00:53 .git
The Git Side: Local I
We’re writing Python code, so let’s create and track cool_code.py
:
user@hostname:~/cool-project$ touch cool_code.py
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: cool_code.py
user@hostname:~/cool-project$ git commit -m "Initial version of cool_code.py"
[main (root-commit) b40dc25] Initial version of cool_code.py
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 cool_code.py
git log
:user@hostname:~/cool-project$ git log
commit b40dc252a3b7355cc4c28397fefe7911ff3c94b9 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:57:16 2023 +0000
Initial version of cool_code.py
user@hostname:~/cool-project$ git status
On branch main
nothing to commit, working tree clean
user@hostname:~/cool-project$ echo "1 + 1" >> cool_code.py
user@hostname:~/cool-project$ more cool_code.py
1 + 1
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: cool_code.py
user@hostname:~/cool-project$ git commit -m "Added code to cool_code.py"
[main e3bc497] Added code to cool_code.py
1 file changed, 1 insertion(+)
The git log
will show the new version:
user@hostname:~/cool-project$ git log
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial version of cool_code.py
user@hostname:~/cool-project$ echo "2 + 2" >> cool_code.py
user@hostname:~/cool-project$ more cool_code.py
1 + 1
2 + 2
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: cool_code.py
user@hostname:~/cool-project$ git commit -m "Second version of cool_code.py"
[main 4007db9] Second version of cool_code.py
1 file changed, 1 insertion(+)
git log
user@hostname:~/cool-project$ git log
commit 4007db9a031ca134fe09eab840b2bc845366a9c1 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:39:28 2023 +0000
Second version of cool_code.py
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial (empty) version of cool_code.py
First check the git log
to find the hash for the commit you want to revert back to:
user@hostname:~/cool-project$ git reset --hard e3bc497ac
HEAD is now at e3bc497 Added code to cool_code.py
user@hostname:~/cool-project$ git log
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial (empty) version of cool_code.py
user@hostname:~/cool-project$ echo "3 + 3" >> cool_code.py
user@hostname:~/cool-project$ git add cool_code.py
user@hostname:~/cool-project$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: cool_code.py
user@hostname:~/cool-project$ git commit -m "Added different code to cool_code.py"
[main 700d955] Added different code to cool_code.py
1 file changed, 1 insertion(+)
The final git log
:
user@hostname:~/cool-project$ git log
commit 700d955faacb27d7b8bc464b9451851b5e319f20 (HEAD -> main)
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:44:49 2023 +0000
Added different code to cool_code.py
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:38:05 2023 +0000
Added code to cool_code.py
commit b40dc25b14c0426b06c8d182184e147853f3c12e
Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>
Date: Sun May 28 00:37:02 2023 +0000
Initial (empty) version of cool_code.py
Even the simplest projects can start to look like:
git push
diff
Frontend |
Backend |
|
---|---|---|
Low Level | HTML/CSS/JavaScript | GitHub Pages |
Middle Level | JS Libraries | PHP, SQL |
High Level | React, Next.js | Node.js, Vercel |
index.html
git init
, git add -A .
, git push
<username>.github.io
!
ssh
scp
rsync
5000-lab-1.2
git clone
.bib
file to add citationsindex.qmd
about.ipynb
slides/slides.ipynb
using the revealjs
formatquarto render
rsync
or scp
to copy the _site
directory to your GU domains server (within ~/public_html
).bib
fileDSAN 5000 W03: Data Science Workflow