Getting Started with HW 3

DSAN 5300: Statistical Learning

Extra Writeups
Published

Tuesday, April 1, 2025

Update Log
  • Original version posted 18 Mar 2025, 7:00pm

The following clarifications on different pieces of HW3 helped me when going through it with students in office hours.

HW-3.2: Advertising Budgets

In the first code cell within this question, we do provide starter code, but you will still need to carry out the last part of the instructions, “Split the data into a training and test set (70% / 30%)”. The starter code does much of the work for you, since you should be able to use the created train object to construct a tibble like train_df, which can then be used with gam() in the next part.

HW-3.2a

Here, as a more specific clarification: there are multiple ways to specify a regression spline in R, but the two ways that I have encountered the most are as follows.

First, using just the gam library in the “default” way it is set up, you should be able to use the s() function to specify spline terms, and then include the optional df argument to specify the degree of each term:

library(gam)
gam_model <- gam(
    dependent ~ s(indep1, df = degree1) + s(indep2, df = degree2) + s(indep3, df = degree3),
    data=train_df
)

However, as a second approach which is more or less popular depending on the field (and the time the code was written), you may also see example code that looks like:

library(gam)
gam_model <- gam(
    dependent ~ s(indep1, k = degree1) + s(indep2, k = degree2) + s(indep3, k = degree3),
    data=train_df
)

At first this looks suspiciously close to the previous code block, but, the difference is that the k argument used in the second block is not one of the arguments included “by default” in the s() function provided by gam. From what I understand, there is a second library called mgcv which modifies gam’s s() function to allow this k argument.

This means, to get this second approach to work, in terms of making sure that the s() function allows the degree argument k, you’ll need to load the mgcv library in addition to gam** (which is already loaded for you in a cell near the top of the distributed homework files). In other words, if you just copy-and-paste the previous code, you’ll get an error (at least in my version of R) that looks like

Error in s(indep1, k = degree1) : unused argument (k = degree1)

So, here, the mgcv library essentially adds in this new k argument that can be passed to s() (which is from gam itself) to specify the degree parameter.