R/Python Coding Cheatsheet

Author
Affiliation

Jeff Jacobs

Published

October 9, 2023

This cheatsheet is intended as a quick reference you can use to “translate” between concepts from the course and the functions/libraries which implement these concepts in both R and Python. Use the search box below (not the one in the page sidebar) to filter all of the rows down to the particular concepts, or the particular functions, that you are looking for.

Mobile Browsing

I strongly recommend viewing this on your laptop (that is, in a full-screen-width browser) rather than on mobile, since I’ve made the page width wider here to make room for the blocks of code within each column!

Concept R Code Python Code
Bernoulli Distribution

Using Base R:

rbinom(num_to_generate, 1, p)

Using SciPy:

from scipy.stats import bernoulli
bernoulli.rvs(p, size=num_to_generate)
Continuous Uniform Distribution

Using Base R:

runif(num_to_generate, a, b)

Using NumPy:

rng = np.random.default_rng(seed=5000)
rng.uniform(a,b,num_to_generate)
Discrete Uniform Distribution

Using the sample() function from Base R:

a <- 1; b <- 10;
sample(a:b, num_to_generate, replace=TRUE)

Using NumPy:

rng = np.random.default_rng(seed=5000)
rng.integers(a, b + 1, size=num_to_generate)
Normal Distribution

Using Base R:

rnorm(num_to_generate, mu, sigma)

Using NumPy:

rng = np.random.default_rng(seed=5000)
rng.normal(mu, sigma, num_to_generate)
Add a new column to a data table

Using Tidyverse:

df <- df |> mutate(
  new_col_name = new_col_value
)

Using Pandas:

df['new_col_name'] = new_col_value
Download a .csv file from a URL

Using the httr library:

library(httr)
# The URL for the .csv file you want
csv_url = "https://example.com/data.csv"
# The filename you'd like to save it to, on your local drive
local_filename = "downloaded_data.csv"
GET(csv_url, write_disk(local_filename), progress())

Using the requests library:

import requests
csv_url = "https://example.com/data.csv"
local_filename = "downloaded_data.csv"
with open(local_filename, 'wb') as outfile:
  data_content = requests.get(csv_url, stream=True).content
  outfile.write(data_content)
Generate a data table from a set of columns

Using Tidyverse:

df <- tibble(col1_name=col1_vals, col2_name=col2_vals)

Using Pandas:

df = pd.DataFrame({'col1_name': col1_vals, 'col2_name': col2_vals})
Generate a data table quickly by entering values in your code

Using Tidyverse:

df <- tribble(
  ~col1_name, ~col2_name,
  col1_val1, col2_val1,
  col1_val2, col2_val2
)

Using Pandas:

df = pd.DataFrame({'col1_name': col1_vals, 'col2_name': col2_vals})
Generate sequence from 1 to N

In steps of size 1:

seq(from = 1, to = N)

In steps of size k:

seq(from = 1, to = N, by = k)

In steps of size 1:

range(1, N + 1)
  • The second argument to range() is exclusive, meaning that we use N + 1 here to indicate that we want the numbers from 1 to N inclusive
  • Also note that range() produces a generator object (an object which generates the sequence “on the fly” when used in a loop), meaning that if you need an actual list of the numbers from 1 to N, you need to explicitly convert the generator object to a list:
list(range(1, N + 1))
Sample N rows from a data table (chosen uniformly at random, without replacement)

Using Tidyverse:

df |> slice_sample(n = N)

Using Pandas:

df.sample(n=N, random_state=5000)
Scatterplot

Using ggplot2:

ggplot(df, aes(x=x_var_name, y=y_var_name)) +
  geom_point()
Where x_var_name and y_var_name are the names of columns within df.

Using Seaborn:

sns.scatterplot(data=df, x='x_var_name', y='y_var_name')
Where x_var_name and y_var_name are the names of columns within df.
Subset columns of a data table by column name

Using Tidyverse:

df |> select(col1_name, col2_name)
will extract just the columns named col1_name and col2_name within df

Using Pandas:

df[['col1_name', 'col2_name']].copy()
will extract just the columns named col1_name and col2_name within df. (We use .copy() at the end so that this produces a new DataFrame object containing just these columns. Otherwise, this operation just returns a pointer to a subset of the original DataFrame object)
Subset rows of a data table by value

Using Tidyverse:

df |> filter(col_name == col_value)
will select just the rows in df for which the value in the column called col_name is col_value

Using Pandas:

df.loc[df['col_name'] == col_value,]
will select just the rows in df for which the value in the column called col_name is col_value
No matching items