Paths & file-system familiarity is essential for accessing & moving data from servers
The file system is composed of directories (folders), programs, and files
The files contain data OR instructions for program creation
Files, programs, & folders have associated permissions to control user access
Directory tree
The Files, folders, & executables are organized in a hierarchical directory tree
The base of the tree is called the root directory
The root folder is denoted by / on Unix machines and \ on Windows machines
Linux directory tree
The following diagram shows the directory tree of a Linux computer
Paths
Paths are “addresses”, they let users navigate the file-system to locate files & folders
Paths can be either relative, i.e. a location relative to the current folder, OR absolute location relative to the root directory
The current working directory (CWD) is where you currently “are” in the tree.
On Unix, the CWD is denoted ./ & one level down is denoted ../ (closer to the root)
The slashes are reversed on windows \, but otherwise the concept is the same
File permissions
System administrators control how much access different users have with-in the file-structure.
Access is based on file permissions associated with a user’s Login ID
Computers keep a database of which user owns each files, & which users have permission to view, edit, & execute EACH file, folder, or program.
Understanding basic data security is a fundamental skill in most modern careers … you don’t want to be the careless person that leaves a software vulnerability and gets your company hacked
Unix file permissions
Unix file permission codes are numeric representations (octal) for read, write, execute permissions, assigned to owners, groups, and others, regulating file access and security.
If a user has authority, they can change file permissions with the chmod command
Common file permissions codes
The following are common permission options.
NOTE: For websites: files are ususually 644 and folders 755
This could be set with chmod 644 my_file.html
You can set all website files permssiosn with the following linux commands
for i in $(find _site -type f); do chmod 644 $i; done
for i in $(find _site -type d); do chmod 755 $i; done `
Super-users
Super-users have total control over the file-system, can view, edit, or execute anything.
A SuperUser is synonymous with root-user, means there is no restrictions on your power over the computer
Usually you are NOT a super-user and you need to coordinate with system administrators, who have super-user status, to set up and control access
“With great power comes great responsibility” - The Spider-Man’s Uncle
Linux command line
A brief introduction.
What is Linux?
Linux describes a family of operating systems (OS), similar to Windows or MacOS
The key difference is that Linux is a FREE and open-source operating system.
It has a Unix-like OS kernel originally created by Linus Torvalds in 1991.
It forms the core of various Linux-based operating systems (distributions) such as Ubuntu, CentOS, RedHat, Fedora, and more.
Linux is known for its stability, security, and flexibility.
Almost all of the worlds super-computers are Linux machines
Web-servers & AWS virtual machines are also often Linux (e.g. GU domains)
Linux key features (optional)
Linux offers a flexible and powerful platform for various computing needs, from personal use to enterprise-level systems.
Open Source: Linux’s source code is freely available, allowing users to modify, distribute, and contribute to its development.
Kernel: Linux serves as the core of the operating system, managing hardware resources, memory, and system processes.
Multiuser and Multitasking: Linux supports multiple users and concurrent tasks, enhancing efficiency.
Security: Linux’s design and permissions system offer robust security features, minimizing vulnerabilities.
Variety of Distributions: Different Linux distributions cater to diverse needs, from server systems to desktop environments.
Command Line Interface: Linux offers a powerful command line interface (CLI) for system management and administration.
Graphical User Interface: Most Linux distributions include GUI options, making it user-friendly for various users.
Software Repositories: Distributions provide software repositories for easy installation and updates of applications.
Networking: Linux is widely used for networking, powering servers, routers, and other network devices.
Customization: Users can customize various aspects of their Linux environment, adapting it to their preferences.
Server and Cloud Usage: Linux is a popular choice for web servers, cloud computing, & containerization platforms like Docker.
Community and Support: The Linux community provides extensive support, forums, and documentation resources.
Why learn the Linux command line?
Useful line on your resume
Intuitive framework and tool-set for computational sciences
Better understanding of system and network administration
Almost all of the worlds super-computers are Linux machines
Web-servers and AWS virtual machines are often Linux
More intuitive interfacing with hardware and software
Smoothly interact with GitHub without using a web browser or GUI
Smoothly switch between environments with Conda
Can “get inside” other computers via the ssh command
Example: Can “get inside” other computers via the ssh command
Interacting with the file-system
Option-1: Interact with the file system via a GUI (graphical user interface)
Option-2: Interact via a command line interface (CLI)
IMPORTANT: The Unix command line is actually more like a computer scripting language (e.g. python), known as shell scripting or bash. It has all of the familiar coding constructs (for-loops, while loops, if/then statements, … etc)
Hidden files: Files & folders that start with . are hidden from the GUI interface (e.g. ~.bash_profile)
Command line access options
Mac & Linux: MacOS is very similar to Linux, both have a built-in Unix CLI.
Windows terminal options:
Command prompt: A text-based interface to execute commands and perform tasks
NOT a Unix CLI, closer to MS-DOS, completely different command structure
Windows powershell: Windows PowerShell is an advanced command-line shell and scripting language for automation and system management.
NOT a Unix command line, but more “Unix-like” than command prompt
Anaconda powershell: Quasi Unix command structure but still quite different
Windows subsystem for Linux (WSL): (highly recommended)
True Linux experience from within Windows, more on this later
GU domains: Command line access
The GU domains web-servers are Linux, you can “get inside” the servers via a browser
You can also ssh inside from your laptop (more on this next week)
Note: that you are NOT inside your laptop here!! But rather the GU-domains server, which is just a REMOTE computer located somewhere else in the world (e.g. California or China).
Linux commands & variables
Everything we discuss on the coming slides applies to (1) Linux CLI, (2) WSL in Windows, (3) the MacOS CLI (although minor differences do exist)
Linux Commands
A Linux command generally follows the following structure:
command [options/flags] [arguments]
Command: The primary action or task that you want the command to perform.
Options/Flags: These are preceded by a hyphen - or double hyphen -- and modify the behavior of the command. They are usually optional.
Arguments: Targets or inputs for the command (files, directories, text, etc).
Linux Variables
Variables are typically denoted using uppercase letters & underscores, e.g. MY_VARIABLE. Values are assigned with variable_name=value
Use $ before the variable name to access its value, e.g $MY_VARIABLE.
Command example: ls
For example, let’s take the ls command and describe its structure with flags:
ls [options/flags] [arguments]
Command: ls stands for “list” and is used to list files and directories.
Options/Flags:
-l or --long: Display detailed information about files.
-a or --all: List all files, including hidden ones.
-h or --human-readable: Display file sizes in a human-readable format.
Arguments: These would be the directories or files you want to list.
For instance, ls -l /path/to/directory.
You can use multiple options and arguments with a command to customize its behavior. Always refer to the command’s manual or help documentation (usually accessible with man command or command --help) to understand all available options and how they affect the command’s behavior.
(1) Navigating the file system
pwd: Print the current working directory (current location in the directory tree)
pwd ../: Print the path of the directory one level above the current directory.
ls: List files and directories in the current directory.
ls ../: List files and directories in the directory one level above.
ls ./: List files and directories in the current directory.
IMPORTANT: The symbol * acts as a wild-card to search for substrings
ls *pub*: List files and directories with names containing “pub”.
ls -d *pub*: List directories only with names containing “pub”.
ls *pub*/*.html: List .html files inside directories with names containing “pub”.
IMPORTANT: You can run multiple commands per line using ; to separate them
ls *pub*; ls *pub*/*.html:
Run two ls commands on one line with ; separating them.
(2) Navigating the file system
cd: Change directory (folder). Navigate to a different folder in the file system.
cd public_html/: Change current directory to public_html/.
cd ../: Move to the directory one level closer to the root.
Note: ../ is “down” if you think of the “root” as the bottom of the computer & “up”, if you think of the root as the top of the computer. Both are terminology are common, just know that ../ takes you closer to the root directory
cd files: Change current directory to “files”.
cd public_html/: Change current directory to public_html/.
cd ~/: Change to the home directory (usually /home/username).
cd public_html/: Change current directory to public_html/.
find -name index.html: Search for a file named index.html.
find -name index*: Search for files starting with “index”.
Viewing file content
more index.html: View the contents of index.html using the more command.
more page2.html: View the contents of “page2.html”.
less index.html: View the contents using the less command (press q to exit).
head index.html: Display the beginning lines of index.html.
tail index.html: Display the last lines of index.html.
tail -n 4 index.html: Display the last 4 lines of index.html.
grep 'Hello' index.html: Search for the string “Hello” in index.html.
Aside: good practice \(\rightarrow\) avoid using spaces in folder-names and files-names
My Folder\(\rightarrow\)My-Folder OR my_folder
Spaces require an escape symbol \ when writing the path My\ Folder
Changing the filesystem
mkdir: Make directory \(\rightarrow\) Creates a new directory. (e.g. mkdir my_folder)
rm: Remove files or directories \(\rightarrow\) Deletes files and folders.
WARNING: Be CAREFUL with rm, it’s irreversible (deletes file permanently)
RECOMMENDATION: (1) ALWAYS work in a folder that is automatically backed up to the cloud (e.g. Dropbox) (2) Push changes to Git-Hub regularly (secondary backup).
rm my_file: deletes file called my_file
rm -rf my_folder: deletes folder called my_folder (requires -r flag)
cp: Copy files or directories \(\rightarrow\) Duplicates files and folders.
mv: Move or rename files/directories \(\rightarrow\) Used for both moving and renaming.
cp ../index.html ./page3.html: Copy index.html one directory closer to root and rename it “page3.html”.
cp -r folder_1 folder_2 make a copy of a folder (requires recursive -r flag)
> page2.html: Create a blank file named “page2.html”.
Shell (bash) scripts
The command line is a scripted language, similar to Python!!!
In a shell script, you can place multiple Linux commands into a file to run sequentially
These are called shell (.sh) or bash scripts
Similar to python (.py), but with Linux commands, instead of python commands
You need to change the permissions to make the script executable chmod a+x my_script.sh
To run the script you use ./my_script.sh from within the relevant folder
Example: Simple example of a shell script
Be careful: This is advanced content, you should only create very simple scripts, unless you know what you are doing.
In particular, we highly recommend NOT USING the rm command in a shell script
Additional important commands (optional)
These commands are foundational for navigating, managing files, and interacting with a Linux system effectively.
touch: Create empty files or update timestamps \(\rightarrow\) Creates new empty files or modifies timestamps.
cat: Concatenate and display file contents \(\rightarrow\) Displays the content of a file in the terminal.
nano/vi: Text editors \(\rightarrow\) nanois user-friendly,vi` is powerful but has a steeper learning curve.
echo: Print text to the terminal or a file \(\rightarrow\) Displays text or variables in the terminal.
grep: Search for text patterns in files \(\rightarrow\) Searches for specific text patterns in files.
chmod: Change file permissions \(\rightarrow\) Modifies access permissions for files and directories.
chown: Change file ownership \(\rightarrow\) Changes the owner of files and directories.
ps: Process status \(\rightarrow\) Lists running processes.
top/htop: Monitor system resources.
top provides real-time process monitoring and htop is a more user-friendly alternative.
df: Disk space usage \(\rightarrow\) Shows available disk space on filesystems.
du: Disk usage of files and directories \(\rightarrow\) Displays the space used by specific files or directories.
wget/curl: Download files from the web \(\rightarrow\)wget and curl can download files from URLs.
tar: Compress and extract files \(\rightarrow\) Used for archiving and compressing files and directories.
ssh: Secure Shell \(\rightarrow\) Connects to remote servers securely.
sudo: Superuser do \(\rightarrow\) Executes commands with superuser privileges.
history: Show a history of commands entered in the terminal.
Aside: Command line editors (optional)
You may find yourself inside a server without GUI access \(\rightarrow\) use a command line editor
Nano is a popular command line editor for coding from the command line
e.g. nano index.html
Other popular options include emacs and vim (not recommended)
Additional reading (optional)
If you want to learn more, the following are popular books on the topic
HTML / CSS / JS
Motivation
Due to the internet, media consumption has changed dramatically over the last 30 years.
Many formats do not allow dynamic (interactive) content (e.g. png, jpeg, etc), however, html can be dynamically and programmatically updated
This modification is done via JavaScript (js), which dramatically expands the functionality of a html.
JavaScript runs after the webpage is loaded and facilitates interactivity.
It enables almost all of the advanced visualization libraries that we will discuss later
We won’t cover much Java-Script in the DSAN program, but will discuss it more in DSAN-5200 in the context of interactive data visualization
Front-end Dev Tools
HTML and DOM
Document object model (DOM)
HTML elements
Fundamental HTML building block
Start tag, content, end tag
HTML attributes
HTML attributes are added to the opening tag of an element to change the element’s default behavior.
Here we are modifying the \(<p>\) (paragraph) element with a unique identifier id attribute and changing the text-color using the style attribute.
HTML structure
An HTML document is a hierarchical tree-like collection of many HTML elements
HTML elements (objects) can have parents, grandparents, siblings, children, grandchildren, etc.
Document object model:
What is it? The Document Object Model (DOM) is a cross-platform and language-independent interface. It treats an XML or HTML document as a tree structure, where each node is an object, representing a part of the document. source
The DOM represents a document as a logical tree, this concept facilitates programmatic access and modification of the tree (add/modify/remove)
When an HTML page is loaded by a browser, it is converted to a hierarchical structure
HTML tags are converted into an objects in the DOM within the parent-child hierarchy
Lab Time!
Getting HTML onto the Internet
index.html
<!DOCTYPE html><html><head><title>My Cool Webpage</title></head><body><h1>Welcome to my Site!</h1><p>I hope you enjoy all the amazing content in here.</p></body></html>
Getting Quarto onto the Internet
index.qmd
---title: "My First Quarto Page!"author: "DSAN Student"format: html: df-print: kable---Hello welcome to my new Quarto webpage!
Python Coding Fundamentals
Types of Languages
Compiled
Interpreted
Primitive Types
Boolean (True or False)
Numbers (Integers, Decimals)
Strings
None
Stack and Heap
Let’s look at what happens, in the computer’s memory, when we run the following code:
Adversarial development: brainstorm all of the ways an evil hacker might break your code!
Example: Finding An Item Within A List
Seems straightforward, right? Given a list l, and a value v, return the index of l which contains v
Corner cases galore…
What if l contains v more than once? What if it doesn’t contain v at all? What if l is None? What if v is None? What if l isn’t a list at all? What if v is itself a list?
Python: #1 Sanity-Preserving Tip!
(For our purposes) the answer to “what is Python?” is: an executable file that runs .py files!
e.g., we can run python mycode.py in Terminal/PowerShell
Everything else: pip, Jupyter, Pandas, etc., is an add-on to this basic functionality!
Code Blocks via Indentation
for i inrange(5):print(i)
0
1
2
3
4
for i inrange(5):print(i)
Cell In[3], line 2 print(i)
^
IndentationError: expected an indented block after 'for' statement on line 1