In this lab, we will learn how to use ifelse() for vectorized control flow, and to avoid writing for loops.
Goal: by the end of this lab, you will be able to assign values conditionally and re-write a for loop using map().
ifelse()
The if () ... else syntax is for control flow. However, ifelse() is a function that returns a vector of the same length as the vector you put in, based on some logical conditions. These are often useful inside mutate().
Note
Note that there is also a function called if_else() that does the same thing, but is more strict about data types. You can use either function.
In the starwars data set, most characters have a species. However, there are many different species.
Note the behavior around NA. Some characters have unknown species.
starwars |>filter(is.na(species))
# A tibble: 4 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Jek Tono… 180 110 brown fair blue NA <NA> <NA>
2 Gregar T… 185 85 black dark brown NA <NA> <NA>
3 Cordé 157 NA brown light brown NA <NA> <NA>
4 Sly Moore 178 48 none pale white NA <NA> <NA>
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
Our previous construction led to everyone non-human or non-droid being classified as Other, when maybe some should be left as NA.
# A tibble: 4 × 3
name species species_update
<chr> <chr> <chr>
1 Jek Tono Porkins <NA> <NA>
2 Gregar Typho <NA> <NA>
3 Cordé <NA> <NA>
4 Sly Moore <NA> <NA>
Create a new variable called is_bald and set it to FALSE if the character has hair of any color, TRUE if the character has no hair, and NA if the character is a droid.
As noted in the book, there are many reasons to avoid writing loops in R. I have never written a repeat loop. There are only rare occasions when a while loop is necessary. Unless you need to explicitly access indices, you can and should rewrite a for loop as a map() statement. I will strongly encourage you to do this!!
Warning
I will consistently and strongly encourage you to eliminate for loops in your R code.
Vectorized operations
Many operations in R are vectorized already, so you often don’t need a loop at all.
Consider generating the first 10 number in some integer sequences. For the perfect squares, you don’t need a loop at all, because the square operator is vectorized. Recall that vectors are built into the fundamental design of R, so things are supposed to work this way!
Note
Built-in vectorization is one of the key ideas that separates R from other programming languages.
x <-1:10x^2
[1] 1 4 9 16 25 36 49 64 81 100
However, consider generating the Fibbonaci sequence. This can’t be vectorized, because each entry depends on the previous two entries. You could write a for loop.
Generally, when you have a vector x as input, and you want to produce a vector y of the same length as output, you can use one of two paradigms:
If the operation can be vectorized, write a function that will take the whole input vector x and compute the whole y vector at once. I suspect that this will be the most efficient method in nearly every case.
If the operation can’t be vectorized, write a function that will compute a single value of y for a single value of x, and then map() that function over x.
Only if NEITHER of these is possible should you write a for loop!
Recall that we saw map() previously in the context of list-columns.
Use the vectorized nchar() function to compute the number of characters in each character’s name, without writing any kind of loop.
Use map() and nchar() to compute the total number of characters in the number of starships associated with each character. For example, Luke Skywalker primarily flew an X-wing fighter, but also briefly piloted an Imperial shuttle in Return of the Jedi. So the number of characters in his starships list is 6 + 16 = 22.