| name | pop | |
|---|---|---|
| 0 | Albania | 2.8 |
| 1 | Algeria | 44.2 |
| 2 | Angola | 34.5 |
DSAN 5000: Data Science and Analytics
Thursday, September 5, 2024
A little basic computer science is very useful for all STEM fields!
Understanding how computers work is crucial for data scientists
Note: These skills become very important in DSAN-6000 (big data & cloud computing)
Physical components of a computer
Computers come in many shapes & sizes, however, they’re all basically the same inside
Read over the following at home
file-system on a hard-disk somewhere, you CAN’T do data science without understanding file-systems!directories (folders), programs, and files
files contain data OR instructions for program creationpermissions to control user access directory treebase of the tree is called the root directory/ on Unix machines and \ on Windows machines
Paths are “addresses”, they let users navigate the file-system to locate files & foldersrelative, i.e. a location relative to the current folder, OR absolute location relative to the root directorycurrent working directory (CWD) is where you currently “are” in the tree../ & one level down is denoted ../ (closer to the root)\, but otherwise the concept is the same System administrators control how much access different users have with-in the file-structure.
chmod commandNOTE: For websites: files are ususually 644 and folders 755
chmod 644 my_file.htmlfor i in $(find _site -type f); do chmod 644 $i; donefor i in $(find _site -type d); do chmod 755 $i; done `root-user, means there is no restrictions on your power over the computersystem administrators, who have super-user status, to set up and control access
“With great power comes great responsibility”
- The Spider-Man’s Uncle
A brief introduction.

Windows or MacOSLinux is a FREE and open-source operating system.
Unix-like OS kernel originally created by Linus Torvalds in 1991.GU domains)Useful line on your resumessh commandExample: Can “get inside” other computers via the ssh command
Option-1: Interact with the file system via a GUI (graphical user interface) Option-2: Interact via a command line interface (CLI) IMPORTANT: The Unix command line is actually more like a computer scripting language (e.g. python), known as shell scripting or bash. It has all of the familiar coding constructs (for-loops, while loops, if/then statements, … etc) Hidden files: Files & folders that start with . are hidden from the GUI interface (e.g. ~.bash_profile) Mac & Linux: MacOS is very similar to Linux, both have a built-in Unix CLI.Windows terminal options:
MS-DOS, completely different command structurehighly recommended)
more on this laterGU domains web-servers are Linux, you can “get inside” the servers via a browser
ssh inside from your laptop (more on this next week) Linux Commands
command [options/flags] [arguments]- or double hyphen -- and modify the behavior of the command. They are usually optional.Linux Variables
variable_name=value$ before the variable name to access its value, e.g $MY_VARIABLE.For example, let’s take the ls command and describe its structure with flags:
ls [options/flags] [arguments]
ls stands for “list” and is used to list files and directories.-l or --long: Display detailed information about files.-a or --all: List all files, including hidden ones.-h or --human-readable: Display file sizes in a human-readable format.ls -l /path/to/directory.man command or command --help) to understand all available options and how they affect the command’s behavior.pwd: Print the current working directory (current location in the directory tree)pwd ../: Print the path of the directory one level above the current directory.ls: List files and directories in the current directory.ls ../: List files and directories in the directory one level above.ls ./: List files and directories in the current directory.* acts as a wild-card to search for substringsls *pub*: List files and directories with names containing “pub”.ls -d *pub*: List directories only with names containing “pub”.ls *pub*/*.html: List .html files inside directories with names containing “pub”.; to separate themls *pub*; ls *pub*/*.html:
ls commands on one line with ; separating them.cd public_html/: Change current directory to public_html/.cd ../: Move to the directory one level closer to the root.
../ is “down” if you think of the “root” as the bottom of the computer & “up”, if you think of the root as the top of the computer. Both are terminology are common, just know that ../ takes you closer to the root directorycd files: Change current directory to “files”.cd public_html/: Change current directory to public_html/.cd ~/: Change to the home directory (usually /home/username).cd public_html/: Change current directory to public_html/.find -name index.html: Search for a file named index.html.find -name index*: Search for files starting with “index”.more index.html: View the contents of index.html using the more command.more page2.html: View the contents of “page2.html”.less index.html: View the contents using the less command (press q to exit).head index.html: Display the beginning lines of index.html.tail index.html: Display the last lines of index.html.tail -n 4 index.html: Display the last 4 lines of index.html.grep 'Hello' index.html: Search for the string “Hello” in index.html.My Folder \(\rightarrow\) My-Folder OR my_folder\ when writing the path My\ Foldermkdir: Make directory \(\rightarrow\) Creates a new directory. (e.g. mkdir my_folder)rm: Remove files or directories \(\rightarrow\) Deletes files and folders.
WARNING: Be CAREFUL with rm, it’s irreversible (deletes file permanently)RECOMMENDATION: (1) ALWAYS work in a folder that is automatically backed up to the cloud (e.g. Dropbox) (2) Push changes to Git-Hub regularly (secondary backup).rm my_file: deletes file called my_filerm -rf my_folder: deletes folder called my_folder (requires -r flag)cp: Copy files or directories \(\rightarrow\) Duplicates files and folders.mv: Move or rename files/directories \(\rightarrow\) Used for both moving and renaming.cp ../index.html ./page3.html: Copy index.html one directory closer to root and rename it “page3.html”.cp -r folder_1 folder_2 make a copy of a folder (requires recursive -r flag)> page2.html: Create a blank file named “page2.html”.shell script, you can place multiple Linux commands into a file to run sequentially
shell (.sh) or bash scriptschmod a+x my_script.sh./my_script.sh from within the relevant folderExample: Simple example of a shell script Be careful: This is advanced content, you should only create very simple scripts, unless you know what you are doing.
rm command in a shell scriptis user-friendly,vi` is powerful but has a steeper learning curve.top provides real-time process monitoring and htop is a more user-friendly alternative.wget and curl can download files from URLs.superuser privileges.Nano is a popular command line editor for coding from the command line
nano index.htmlemacs and vim (not recommended)Due to the internet, media consumption has changed dramatically over the last 30 years.


<a>).<a href="http://npr.org/">news</a> creates a link for the anchor text “news,” which will cause the browser to fetch the HTML for the pageserver is the computer where the content “lives”, on some hard-driveclient.
Source: https://developer.mozilla.org/en-US/docs/Learn/JavaScript/First_steps/What_is_JavaScript
html can be dynamically and programmatically updated
We won’t cover much Java-Script in the DSAN program, but will discuss it more in DSAN-5200 in the context of interactive data visualization
Document object model (DOM)




True or False)NoneLet’s look at what happens, in the computer’s memory, when we run the following code:
| name | pop | |
|---|---|---|
| 0 | Albania | 2.8 |
| 1 | Algeria | 44.2 |
| 2 | Angola | 34.5 |
l, and a value v, return the index of l which contains vl contains v more than once? What if it doesn’t contain v at all? What if l is None? What if v is None? What if l isn’t a list at all? What if v is itself a list?.py files!
python mycode.py in Terminal/PowerShellpip, Jupyter, Pandas, etc., is an add-on to this basic functionality!0
1
2
3
4
Cell In[3], line 2 print(i) ^ IndentationError: expected an indented block after 'for' statement on line 1
DSAN 5000 W02: Nuts and Bolts for Data Science