R/Python Coding Cheatsheet

Author

Affiliation

Jeff Jacobs

Published

October 9, 2023

This cheatsheet is intended as a quick reference you can use to “translate” between concepts from the course and the functions/libraries which implement these concepts in both R and Python. Use the search box below (not the one in the page sidebar) to filter all of the rows down to the particular concepts, or the particular functions, that you are looking for.

Mobile Browsing

I strongly recommend viewing this on your laptop (that is, in a full-screen-width browser) rather than on mobile, since I’ve made the page width wider here to make room for the blocks of code within each column!

Concept	R Code	Python Code
Bernoulli Distribution	Using Base R: `rbinom(num_to_generate, 1, p)`	Using SciPy: `from scipy.stats import bernoulli bernoulli.rvs(p, size=num_to_generate)`
Continuous Uniform Distribution	Using Base R: `runif(num_to_generate, a, b)`	Using NumPy: `rng = np.random.default_rng(seed=5000) rng.uniform(a,b,num_to_generate)`
Discrete Uniform Distribution	Using the `sample()` function from Base R: `a <- 1; b <- 10; sample(a:b, num_to_generate, replace=TRUE)`	Using NumPy: `rng = np.random.default_rng(seed=5000) rng.integers(a, b + 1, size=num_to_generate)`
Normal Distribution	Using Base R: `rnorm(num_to_generate, mu, sigma)`	Using NumPy: `rng = np.random.default_rng(seed=5000) rng.normal(mu, sigma, num_to_generate)`
Add a new column to a data table	Using Tidyverse: `df <- df \|> mutate( new_col_name = new_col_value )`	Using Pandas: `df['new_col_name'] = new_col_value`
Download a `.csv` file from a URL	Using the `httr` library: `library(httr) # The URL for the .csv file you want csv_url = "https://example.com/data.csv" # The filename you'd like to save it to, on your local drive local_filename = "downloaded_data.csv" GET(csv_url, write_disk(local_filename), progress())`	Using the `requests` library: `import requests csv_url = "https://example.com/data.csv" local_filename = "downloaded_data.csv" with open(local_filename, 'wb') as outfile: data_content = requests.get(csv_url, stream=True).content outfile.write(data_content)`
Generate a data table from a set of columns	Using Tidyverse: `df <- tibble(col1_name=col1_vals, col2_name=col2_vals)`	Using Pandas: `df = pd.DataFrame({'col1_name': col1_vals, 'col2_name': col2_vals})`
Generate a data table quickly by entering values in your code	Using Tidyverse: `df <- tribble( ~col1_name, ~col2_name, col1_val1, col2_val1, col1_val2, col2_val2 )`	Using Pandas: `df = pd.DataFrame({'col1_name': col1_vals, 'col2_name': col2_vals})`
Generate sequence from `1` to `N`	In steps of size 1: `seq(from = 1, to = N)` In steps of size `k`: `seq(from = 1, to = N, by = k)`	In steps of size 1: `range(1, N + 1)` The second argument to `range()` is exclusive, meaning that we use `N + 1` here to indicate that we want the numbers from 1 to `N` inclusive Also note that `range()` produces a generator object (an object which generates the sequence “on the fly” when used in a loop), meaning that if you need an actual list of the numbers from 1 to `N`, you need to explicitly convert the generator object to a list: `list(range(1, N + 1))`
Sample `N` rows from a data table (chosen uniformly at random, without replacement)	Using Tidyverse: `df \|> slice_sample(n = N)`	Using Pandas: `df.sample(n=N, random_state=5000)`
Scatterplot	Using `ggplot2`: `ggplot(df, aes(x=x_var_name, y=y_var_name)) + geom_point()` Where `x_var_name` and `y_var_name` are the names of columns within `df`.	Using Seaborn: `sns.scatterplot(data=df, x='x_var_name', y='y_var_name')` Where `x_var_name` and `y_var_name` are the names of columns within `df`.
Subset columns of a data table by column name	Using Tidyverse: `df \|> select(col1_name, col2_name)` will extract just the columns named `col1_name` and `col2_name` within `df`	Using Pandas: `df[['col1_name', 'col2_name']].copy()` will extract just the columns named `col1_name` and `col2_name` within `df`. (We use `.copy()` at the end so that this produces a new `DataFrame` object containing just these columns. Otherwise, this operation just returns a pointer to a subset of the original `DataFrame` object)
Subset rows of a data table by value	Using Tidyverse: `df \|> filter(col_name == col_value)` will select just the rows in `df` for which the value in the column called `col_name` is `col_value`	Using Pandas: `df.loc[df['col_name'] == col_value,]` will select just the rows in `df` for which the value in the column called `col_name` is `col_value`