name | pop | |
---|---|---|
0 | Albania | 2.8 |
1 | Algeria | 44.2 |
2 | Angola | 34.5 |
DSAN 5000: Data Science and Analytics
Thursday, September 5, 2024
A little basic computer science is very useful for all STEM fields!
Understanding how computers work is crucial for data scientists
Note: These skills become very important in DSAN-6000
(big data & cloud computing)
Physical components of a computer
Computers come in many shapes & sizes, however, they’re all basically the same inside
Read over the following at home
file-system
on a hard-disk
somewhere, you CAN’T do data science without understanding file-systems!directories
(folders), programs
, and files
files
contain data OR instructions for program
creationpermissions
to control user access directory tree
base
of the tree is called the root directory
/
on Unix machines and \
on Windows machines
Paths
are “addresses”, they let users navigate the file-system to locate files & foldersrelative
, i.e. a location relative to the current folder
, OR absolute
location relative to the root
directorycurrent working directory
(CWD) is where you currently “are” in the tree../
& one level down
is denoted ../
(closer to the root)\
, but otherwise the concept is the same System administrators
control how much access
different users
have with-in the file-structure.
chmod
commandNOTE:
For websites: files are ususually 644
and folders 755
chmod 644 my_file.html
for i in $(find _site -type f); do chmod 644 $i; done
for i in $(find _site -type d); do chmod 755 $i; done
`root-user
, means there is no restrictions on your power over the computersystem administrators
, who have super-user status, to set up and control access“With great power comes great responsibility”
- The Spider-Man’s Uncle
A brief introduction.
Windows
or MacOS
Linux
is a FREE
and open-source
operating system.
Unix-like
OS kernel originally created by Linus Torvalds
in 1991
.GU domains
)Useful line on your resume
ssh command
Example:
Can “get inside” other computers via the ssh command
Option-1:
Interact with the file system via a GUI (graphical user interface) Option-2:
Interact via a command line interface (CLI)
IMPORTANT:
The Unix command line is actually more like a computer scripting language (e.g. python), known as shell scripting
or bash
. It has all of the familiar coding constructs (for-loops, while loops, if/then statements, … etc) Hidden files
: Files & folders that start with .
are hidden from the GUI interface (e.g. ~.bash_profile
) Mac & Linux:
MacOS is very similar to Linux, both have a built-in Unix
CLI.Windows
terminal options:
MS-DOS
, completely different command structurehighly recommended
)
more on this later
GU domains
web-servers are Linux, you can “get inside” the servers via a browser
ssh
inside from your laptop (more on this next week) Linux Commands
command [options/flags] [arguments]
-
or double hyphen --
and modify the behavior of the command. They are usually optional.Linux Variables
variable_name=value
$
before the variable name to access its value, e.g $MY_VARIABLE
.For example, let’s take the ls
command and describe its structure with flags:
ls [options/flags] [arguments]
ls
stands for “list” and is used to list files and directories.-l
or --long
: Display detailed information about files.-a
or --all
: List all files, including hidden ones.-h
or --human-readable
: Display file sizes in a human-readable format.ls -l /path/to/directory
.man command
or command --help
) to understand all available options and how they affect the command’s behavior.pwd
: Print the current working directory (current location in the directory tree)pwd ../
: Print the path of the directory one level above the current directory.ls
: List files and directories in the current directory.ls ../
: List files and directories in the directory one level above.ls ./
: List files and directories in the current directory.*
acts as a wild-card
to search for substringsls *pub*
: List files and directories with names containing “pub”.ls -d *pub*
: List directories only with names containing “pub”.ls *pub*/*.html
: List .html
files inside directories with names containing “pub”.;
to separate themls *pub*; ls *pub*/*.html
:
ls
commands on one line with ;
separating them.cd public_html/
: Change current directory to public_html/
.cd ../
: Move to the directory one level closer to the root
.
../
is “down” if you think of the “root” as the bottom of the computer & “up”, if you think of the root
as the top of the computer. Both are terminology are common, just know that ../
takes you closer to the root
directorycd files
: Change current directory to “files”.cd public_html/
: Change current directory to public_html/
.cd ~/
: Change to the home directory (usually /home/username
).cd public_html/
: Change current directory to public_html/
.find -name index.html
: Search for a file named index.html
.find -name index*
: Search for files starting with “index”.more index.html
: View the contents of index.html
using the more
command.more page2.html
: View the contents of “page2.html”.less index.html
: View the contents using the less
command (press q
to exit).head index.html
: Display the beginning lines of index.html
.tail index.html
: Display the last lines of index.html
.tail -n 4 index.html
: Display the last 4 lines of index.html
.grep 'Hello' index.html
: Search for the string “Hello” in index.html
.My Folder
\(\rightarrow\) My-Folder
OR my_folder
\
when writing the path My\ Folder
mkdir
: Make directory \(\rightarrow\) Creates a new directory. (e.g. mkdir my_folder
)rm
: Remove files or directories \(\rightarrow\) Deletes files and folders.
WARNING:
Be CAREFUL with rm
, it’s irreversible
(deletes file permanently)RECOMMENDATION
: (1) ALWAYS work in a folder that is automatically backed up to the cloud (e.g. Dropbox) (2) Push changes to Git-Hub regularly (secondary backup).rm my_file
: deletes file called my_file
rm -rf my_folder
: deletes folder called my_folder
(requires -r
flag)cp
: Copy files or directories \(\rightarrow\) Duplicates files and folders.mv
: Move or rename files/directories \(\rightarrow\) Used for both moving and renaming.cp ../index.html ./page3.html
: Copy index.html
one directory closer to root and rename it “page3.html”.cp -r folder_1 folder_2
make a copy of a folder (requires recursive -r
flag)> page2.html
: Create a blank file named “page2.html”.shell script
, you can place multiple Linux commands into a file to run sequentially
shell
(.sh) or bash
scriptschmod a+x my_script.sh
./my_script.sh
from within the relevant folderExample:
Simple example of a shell script Be careful
: This is advanced content, you should only create very simple scripts, unless you know what you are doing.
rm
command in a shell scriptis user-friendly,
vi` is powerful but has a steeper learning curve.top
provides real-time process monitoring and htop
is a more user-friendly alternative.wget
and curl
can download files from URLs.superuser privileges
.Nano
is a popular command line editor for coding from the command line
nano index.html
emacs
and vim
(not recommended)Due to the internet, media consumption has changed dramatically over the last 30 years.
<a>
).<a href="http://npr.org/">news</a>
creates a link for the anchor text “news,” which will cause the browser to fetch the HTML for the pageserver
is the computer where the content “lives”, on some hard-driveclient
.Source: https://developer.mozilla.org/en-US/docs/Learn/JavaScript/First_steps/What_is_JavaScript
html
can be dynamically and programmatically updatedWe won’t cover much Java-Script in the DSAN program, but will discuss it more in DSAN-5200 in the context of interactive data visualization
Document object model (DOM)
True
or False
)None
Let’s look at what happens, in the computer’s memory, when we run the following code:
name | pop | |
---|---|---|
0 | Albania | 2.8 |
1 | Algeria | 44.2 |
2 | Angola | 34.5 |
l
, and a value v
, return the index of l
which contains v
l
contains v
more than once? What if it doesn’t contain v
at all? What if l
is None
? What if v
is None
? What if l
isn’t a list at all? What if v
is itself a list?.py
files!
python mycode.py
in Terminal/PowerShellpip
, Jupyter, Pandas, etc., is an add-on to this basic functionality!0
1
2
3
4
Cell In[3], line 2 print(i) ^ IndentationError: expected an indented block after 'for' statement on line 1
DSAN 5000 W02: Nuts and Bolts for Data Science