Vectors

Mini-Lecture 2

Ben Baumer

Smith College

2024-09-05

Slack review

Garbage collection

when does R determine that a file is no longer needed to be considered for garbage collection?

Garbage collection is lazy
Your OS reclaims memory from R only when it needs it

Tibbles

I can’t seem to figure out why there is such a difference in the memory usage of a tibble vs a dataframe? Is it because a dataframe is more like a proper file where as a tibble is like preview?

library(tidyverse)
library(lobstr)
obj_size(as_tibble(iris)) - obj_size(iris)

136 B

obj_size(attr(iris, "class"))

120 B

obj_size(attr(as_tibble(iris), "class"))

256 B

Wide vs. long

I understand why “long” and “wide” data would take up different amounts of memory in Exercise 7 but am curious why long format data takes up more? Intuitively I would have thought that fewer vectors would take up less space than more vectors, even if they’re longer

dim(iris)

[1] 150   5

iris_long <- iris |>
  pivot_longer(-Species, names_to = "type", values_to = "measurement")

dim(iris_long)

[1] 600   3

obj_size(iris_long) / obj_size(iris)

1.93 B

prod(dim(iris_long)) / prod(dim(iris))

[1] 2.4

There is overhead because pivot_longer() adds a new variable

Wide vs. long (cont’d)

But you’re right otherwise (about factors, at least)!

iris |>
  map_dbl(obj_size)

Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
        1248         1248         1248         1248         1248

iris_long |> 
  map_dbl(obj_size)

    Species        type measurement 
       3048        5104        4848

class(iris_long$Species)

[1] "factor"

When does it matter?

When does object size start to make a noticeable difference in the efficiency/speed of the code? For example, if you had a long data frame vs. a wide one, is there a # of rows/columns that would make long tables slow your code down significantly with a long frame instead of a wide one, or is it just completely dependent on what kind of program you’re running?

Sounds like a great project!

Vectors

Clarification

Lists always store references to other objects

Coercion

character → double → integer → logical

Makes more sense to me that the arrows go the other way!

`data.frame`s and `tibble`s

Differences

tibble() never coerces an input
tibble() won’t transform non-syntactic names
tibble() only recycles vectors of length 1
tibble() allows references to created variables
[ always returns a tibble
$ doesn’t do partial matching

Method-oriented programming?

Suppose we have an instrument object called violin and a method called play()

in Java:

instrument MyViolin = new Violin();
MyViolin.play();

but in R:

# constructor sets class attribute to "instrument"
my_violin <- violin()

# generic function dispatches on class attribute
play(my_violin)

# what actually happens!!!
play.instrument(my_violin)  #<<

List-columns

Where have you seen this before?
sf objects have geometry list-column
fitting many models

`sf` list-columns

library(sf)
library(macleish)
boundary <- macleish_layers[["boundary"]]

boundary |>
  as_tibble()

# A tibble: 1 × 2
    area                                                                geometry
  [acre]                                                           <POLYGON [°]>
1   255. ((-72.68133 42.45536, -72.68108 42.45539, -72.68111 42.45549, -72.6811…

boundary$geometry |> class()

[1] "sfc_POLYGON" "sfc"

boundary$geometry |> typeof()

[1] "list"

`nest()` and `unnest()`

library(tidyr)
nrow(starwars)

[1] 87

starwars_person_film <- starwars |>
  unnest(films)

nrow(starwars_person_film)

[1] 173

starwars_person_film |>
  nest(films) |>
  nrow()

[1] 87

Now

Work on

Lab #2: Vectors
Reading quiz on Moodle by Sunday night at 11:59 pm

Vectors

Slack review

Garbage collection

Tibbles

Wide vs. long

Wide vs. long (cont’d)

When does it matter?

Vectors

Clarification

Coercion

data.frames and tibbles

Differences

Method-oriented programming?

List-columns

sf list-columns

nest() and unnest()

Now

Work on

`data.frame`s and `tibble`s

`sf` list-columns

`nest()` and `unnest()`