Crucial distinction: can set up a “mini-internet”, an intranet, within your own home
Organizations (businesses, government agencies) with security needs often do exactly this: link a set of computers and servers together, no outside access
Internet = basically a giant intranet, open to the whole world
Key Building Blocks: Locating Servers
IP Addresses (Internet Protocol addresses): Numeric addresses for uniquely identifying computers on a network
Georgetown University, for example, is allocated IP addresses between 141.161.0.0 and 141.161.255.255
URLs (Uniform Resource Locators): The more human-readable website addresses you’re used to: google.com, georgetown.edu, etc.
Built on top of IP addresses, via a directory which maps URLs → IP addresses
georgetown.edu, for example, is really 23.185.0.21
What Happens When I Visit a URL/IP?
HTTP(S) (HyperText Transfer Protocol (Secure)): common syntax for web clients to make requests and servers to respond
Several types of requests can be made: GET, POST, HEAD; for now, we focus on the GET request, the request your browser makes by default
HTML (HyperText Markup Language): For specifying layout and content of page
Structure is analogous to boxes of content: <html> box contains <head> (metadata, e.g., page title) and <body> (page content) boxes, <body> box contains e.g. header, footer, navigation bar, and main content of page.
Modern webpages also include CSS (Cascading Style Sheets) for styling this content, and Javascript2 for interactivity (changing/updating content)
HTML allows linking to another page with a special anchor tag (<a>): <a href="https://npr.org/">news</a> creates a link, so when you click “news”, browser will request (fetch the HTML for) the URLhttps://npr.org
Sometimes we mean the hardware, the box of processors and hard drives
But, sometimes we mean the software that runs on the hardware
A web server, in the software sense, is a program that is always running, 24/7
Waits for requests (via HTTPS), then serves HTML code in response (also via HTTPS)
How Does a Web Client Work?
Once the server has responded to your request, you still only have raw HTML code
So, the browser is the program that renders this raw HTML code as a visual, (possibly) interactive webpage
As a data scientist, the most important thing to know is that different browsers can render the same HTML differently!
A headache when pages are accessed through laptops
A nightmare when pages are accessed through laptops and mobile
Connecting to Servers
We’ve talked about the shell on your local computer, as well as the Georgetown Domains shell
We used Georgetown Domains’ web interface to access that shell, but you can remotely connect to any other shell from your local computer using the ssh command!
Transferring Files to/from Servers
Recall the copy command, cp, for files on your local computer
There is a remote equivalent, scp (Secure Copy Protocol), which you can use to copy files to/from remote servers to your local computer
Important Alternative: rsync
Similar to scp, with same syntax, except it synchronizes (only copies files which are different or missing)
Main human motivations (Max Weber): Wealth, Prestige, Power → “TED talk circuit”
Science vs. Human Fallibility
Scientific method + replicability/pre-registration = “Tying ourselves to the mast”
If we aim to disprove (!) our hypotheses, and we pre-register our methodology, we are bound to discovering truth, even when it is disadvantageous to our lives…
Human Fallibility is Winning…
More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. Those are some of the telling figures that emerged from Nature’s survey of 1,576 researchers (Baker 2016)
source("../_globals.r")library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
On its own, just runs scripts via python <script>.py
Reproducibility and Literate Programming
Reproducible document: includes both the content (text, tables, figures) and the code or instructions required to generate that content.
Designed to ensure that others can reproduce the same document, including its data analysis, results, and visualizations, consistently and accurately.
tldr: If you’re copying-and-pasting results from your code output to your results document, a red flag should go off in your head!
Literate programming is a coding and documentation approach where code and explanations of the code are combined in a single document.
Emphasizes clear and understandable code by interleaving human-readable text (explanations, comments, and documentation) with executable code.
Single Source, Many Outputs
We can create content (text, code, results, graphics) within a source document, and then use different weaving engines to create different document types:
Documents
Web pages (HTML)
Word documents
PDF files
Presentations
HTML
PowerPoint
Websites/blogs
Books
Dashboards
Interactive documents
Formatted journal articles
Interactivity!
Are we “hiding something” by choosing a specific bin width? Make it transparent!
Let’s make a directory for our project called cool-project, and initialize a Git repo for it
user@hostname:~$ mkdir cool-projectuser@hostname:~$ cd cool-projectuser@hostname:~/cool-project$ git initInitialized empty Git repository in /home/user/cool-project/.git/
This creates a hidden folder, .git, in the directory:
user@hostname:~/cool-project$ ls -lahtotal 12Kdrwxr-xr-x 3 user user 4.0K May 28 00:53 .drwxr-xr-x 12 user user 4.0K May 28 00:53 ..drwxr-xr-x 7 user user 4.0K May 28 00:53 .git
Adding and Committing a File
We’re writing Python code, so let’s create and track cool_code.py:
user@hostname:~/cool-project$ touch cool_code.pyuser@hostname:~/cool-project$ git add cool_code.pyuser@hostname:~/cool-project$ git statusOn branch mainNo commits yetChanges to be committed:(use"git rm --cached <file>..." to unstage)new file: cool_code.pyuser@hostname:~/cool-project$ git commit -m"Initial version of cool_code.py"[main(root-commit)b40dc25] Initial version of cool_code.py1 file changed, 0 insertions(+), 0 deletions(-)create mode 100644 cool_code.py
The Commit Log
View the commit log using git log:
user@hostname:~/cool-project$ git logcommit b40dc252a3b7355cc4c28397fefe7911ff3c94b9 (HEAD-> main)Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:57:16 2023 +0000Initial version of cool_code.py
Making Changes
user@hostname:~/cool-project$ git statusOn branch mainnothing to commit, working tree cleanuser@hostname:~/cool-project$ echo "1 + 1">> cool_code.pyuser@hostname:~/cool-project$ more cool_code.py1 + 1user@hostname:~/cool-project$ git add cool_code.pyuser@hostname:~/cool-project$ git statusOn branch mainChanges to be committed:(use"git restore --staged <file>..." to unstage)modified: cool_code.pyuser@hostname:~/cool-project$ git commit -m"Added code to cool_code.py"[main e3bc497] Added code to cool_code.py1 file changed, 1 insertion(+)
The git log will show the new version:
user@hostname:~/cool-project$ git logcommit e3bc497acbb5a487566ff2014dcd7b83d0c75224 (HEAD-> main)Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:38:05 2023 +0000Added code to cool_code.pycommit b40dc25b14c0426b06c8d182184e147853f3c12eAuthor: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:37:02 2023 +0000Initial version of cool_code.py
More Changes
user@hostname:~/cool-project$ echo "2 + 2">> cool_code.pyuser@hostname:~/cool-project$ more cool_code.py1 + 12 + 2user@hostname:~/cool-project$ git add cool_code.pyuser@hostname:~/cool-project$ git statusOn branch mainChanges to be committed:(use"git restore --staged <file>..." to unstage)modified: cool_code.pyuser@hostname:~/cool-project$ git commit -m"Second version of cool_code.py"[main 4007db9] Second version of cool_code.py1 file changed, 1 insertion(+)
And the git log
user@hostname:~/cool-project$ git logcommit 4007db9a031ca134fe09eab840b2bc845366a9c1 (HEAD-> main)Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:39:28 2023 +0000Second version of cool_code.pycommit e3bc497acbb5a487566ff2014dcd7b83d0c75224Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:38:05 2023 +0000Added code to cool_code.pycommit b40dc25b14c0426b06c8d182184e147853f3c12eAuthor: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:37:02 2023 +0000Initial(empty)version of cool_code.py
Undoing a Commit I
First check the git log to find the hash for the commit you want to revert back to:
commit e3bc497acbb5a487566ff2014dcd7b83d0c75224Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:38:05 2023 +0000Added code to cool_code.py
Undoing a Commit II
This is irreversable!
user@hostname:~/cool-project$ git reset --hard e3bc497acHEAD is now at e3bc497 Added code to cool_code.pyuser@hostname:~/cool-project$ git logcommit e3bc497acbb5a487566ff2014dcd7b83d0c75224 (HEAD-> main)Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:38:05 2023 +0000Added code to cool_code.pycommit b40dc25b14c0426b06c8d182184e147853f3c12eAuthor: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:37:02 2023 +0000Initial(empty)version of cool_code.py
Onwards and Upwards
user@hostname:~/cool-project$ echo "3 + 3">> cool_code.pyuser@hostname:~/cool-project$ git add cool_code.pyuser@hostname:~/cool-project$ git statusOn branch mainChanges to be committed:(use"git restore --staged <file>..." to unstage)modified: cool_code.pyuser@hostname:~/cool-project$ git commit -m"Added different code to cool_code.py"[main 700d955] Added different code to cool_code.py1 file changed, 1 insertion(+)
The final git log:
user@hostname:~/cool-project$ git logcommit 700d955faacb27d7b8bc464b9451851b5e319f20 (HEAD-> main)Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:44:49 2023 +0000Added different code to cool_code.pycommit e3bc497acbb5a487566ff2014dcd7b83d0c75224Author: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:38:05 2023 +0000Added code to cool_code.pycommit b40dc25b14c0426b06c8d182184e147853f3c12eAuthor: Jeff Jacobs <jjacobs3@cs.stanford.edu>Date: Sun May 28 00:37:02 2023 +0000Initial(empty)version of cool_code.py
But Why These Diagrams?
Even the simplest projects can start to look like:
The GitHub Side: Remote
An Empty Repo
Refresh after git push
Commit History
Checking the diff
Web Development
Frontend
Backend
Low Level
HTML/CSS/JavaScript
GitHub Pages
Middle Level
JS Libraries
PHP, SQL
High Level
React, Next.js
Node.js, Vercel
Frontend icons: UI+UI elements, what the user sees (on the screen), user experience (UX), data visualization Backend icons: Databases, Security
Getting Content onto the Internet
Step 1: index.html
Step 2: Create GitHub repository
Step 3: git init, git add -A ., git push
Step 4: Enable GitHub Pages in repo settings
Step 5: <username>.github.io!
Deploying from a Branch/Folder
Lab Demonstrations
Lab Demonstration 1: Transferring Files
ssh
scp
rsync
Lab Demonstration 2: Quarto
Lab Demonstration 3: Git and GitHub
Lab Assignment Overview
Assignment Overview
Create a repo on your private GitHub account called 5000-lab-1.2
Clone the repo to your local machine with git clone
Create a blank Quarto website project, then use a .bib file to add citations
Add content to index.qmd
Add content to about.ipynb
Build a simple presentation in slides/slides.ipynb using the revealjs format
Render the website using quarto render
Sync your changes to GitHub
Use rsync or scp to copy the _site directory to your GU domains server (within ~/public_html)
Create a Zotero (or Mendeley) account, download the software, and add at least one reference to your site by syncing the .bib file
References
Baker, Monya. 2016. “1,500 Scientists Lift the Lid on Reproducibility.”Nature 533 (7604): 452–54. https://doi.org/10.1038/533452a.
Menczer, Filippo, Santo Fortunato, and Clayton A. Davis. 2020. A First Course in Network Science. Cambridge University Press.
Footnotes
To see this, you can open your Terminal and run the ping command: ping georgetown.edu.↩︎
Incredibly, despite the name, Javascript has absolutely nothing to do with the Java programming language…↩︎
Sorry for jargon: it just means using the same word for different levels of a system (dangerous when talking computers!)↩︎