S3

Mini-Lecture 5

Ben Baumer

Smith College

2024-09-17

Slack review

Errors

Subsetting Lab Q4: I don’t understand why starwars[c(lgl, lgl), ] gives us an error message.

  • This is a choice made in the tibble package
  • It “works” if you coerce to a data.frame:
library(tidyverse)
lgl <- starwars$eye_color == "blue"
as.data.frame(starwars)[c(lgl, lgl), ] |>
  pull(eye_color)
 [1] "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"
[11] "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" NA    
[21] NA     NA     NA     NA     NA     NA     NA     NA     NA     NA    
[31] NA     NA     NA     NA     NA     NA     NA     NA    

Recycling

when are recycling rules helpful? For example, when taking x[y] and x and y are different lengths, when would you want it to recycle the shorter of the two to the length of the longer?

students <- tibble(
  # generate random 990 numbers
  student_id = runif(24) * 1000000 + 990000000,
  # assign them to 8 groups of 3
  group_id = 1:8
)
Error in `tibble()`:
! Tibble columns must have compatible sizes.
• Size 24: Existing data.
• Size 8: Column `group_id`.
ℹ Only values of size one are recycled.
students <- data.frame(
  student_id = runif(24) * 1000000 + 990000000,
  group_id = 1:8
)
students
   student_id group_id
1   990235000        1
2   990900764        2
3   990295144        3
4   990790880        4
5   990812893        5
6   990711373        6
7   990207469        7
8   990253795        8
9   990321713        1
10  990143969        2
11  990030544        3
12  990239154        4
13  990063628        5
14  990349172        6
15  990152259        7
16  990569302        8
17  990970597        1
18  990984642        2
19  990391556        3
20  990520353        4
21  990245186        5
22  990679165        6
23  990313915        7
24  990972126        8

Pepper vs. pepper packets

im not sure i get why starwars[[vars]] throws an error.

  • vars is a vector of length 2, but
  • you can’t hold two pepper packets without a pepper shaker!
  • you can’t have two train cars without a train!

An object is just an object

Something I was a little unsure about was how starwars[lgl, ] was able to specifically pick out the info about eye color since it seems like lgl is just a logical vector that doesn’t store any additional info about how that vector was constructed

  • This is correct
  • In general, objects have no memory

S3

Object-oriented programming

We need three concepts from OOP:

  • Class: a kind of object
    • Use class() or is.*() to find out
  • Method: a function that operates on objects of a specific class
    • Use methods() to see them
  • (Multiple) Inheritance: objects can belong to one or more classes
    • Use inherits() to find out

Objects are instances of a class

Multiple inheritance

Generic functions and method dispatch

  • Generic functions have methods written for specific classes
  • When you call a generic function:
    • R uses class() to determine the class of your object
    • R looks for a method that will work for the first class
    • If it can’t find one, it will look for the second class…
    • If it can’t find one, it will call the default method

Defining a method

sloop::is_s3_generic("summary")
[1] TRUE
args("summary")
function (object, ...) 
NULL
summary.my_lm <- function(object, ...) {
  message("R^2 is overrated!")
  object$coefficients
}
mod <- lm(mpg ~ disp, data = mtcars)
class(mod) <- c("my_lm", class(mod))
summary(mod)
(Intercept)        disp 
29.59985476 -0.04121512 

NextMethod()

summary.my_lm <- function(object, ...) {
  message("R^2 is overrated!")
  NextMethod()
}
summary(mod)
R^2 is overrated!

Call:
lm(formula = mpg ~ disp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.8922 -2.2022 -0.9631  1.6272  7.2305 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
disp        -0.041215   0.004712  -8.747 9.38e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.251 on 30 degrees of freedom
Multiple R-squared:  0.7183,    Adjusted R-squared:  0.709 
F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

Defining a new generic

rmse <- function(x, ...) {
  UseMethod("rmse")
}
rmse.lm <- function(x, ...) {
  sqrt(mean(x$residuals^2))
}
rmse(mod)
[1] 3.148207
  • Now define methods for other model classes (e.g., rmse.glm(), etc.)

OOP in R vs. Java

  • In Java:
# Class object = new Constructor(args);
Rectangle r = new Rectangle(0,0,5,5);

# object.method(arg1, arg2);
r.setSize(10, 15);
  • In R using S3:
# object <- constructor(args)
r <- rectangle(0,0,5,5)

# generic(object, arg1, arg2)
set_size(r, 10, 15)
# what really happens:
# generic.method(object, arg1, arg2)
set_size.rectangle(r, 10, 15)

Homework

  • Lab #11: S3
  • Reading quiz on Moodle by Tuesday night at 11:59 pm