class: center, middle, inverse, title-slide # Programming with data ## Iteration ### Ben Baumer ### SDS 192Oct 21, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_scale_02-iteration.html
) --- ## Why iteration? - You have: - a function `f(val)` that returns a value - a list of `val`'s you want to apply `f()` to - Problem: - the list might be looooong -- no copy-and-pasting - want to collect all of the results -- - Solution: - run `f()` on each of the `val`'s - collect the results -- - Trick: - do it in one line of code --- ## Iteration .center[![](http://www.braveclojure.com/assets/images/cftbat/core-functions-in-depth/mapping.png)] --- ## `purrr` .footnote[https://github.com/tidyverse/purrr] .pull-left[ .center[![](https://github.com/tidyverse/purrr/raw/master/man/figures/logo.png)] ] .pull-right[ - [Functional programming](https://en.wikipedia.org/wiki/Functional_programming) - `map()` + `reduce()` - [MapReduce](https://en.wikipedia.org/wiki/MapReduce) - [Cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf) ] --- ## The `map()` function .center[![](https://d33wubrfki0l68.cloudfront.net/f0494d020aa517ae7b1011cea4c4a9f21702df8b/2577b/diagrams/functionals/map.png)] .footnote[basically same as `lapply()`] --- ## Ex 1: setup ```r # a function show_cars <- function(mod = "civic", n = 6) { mpg %>% filter(model == mod) %>% head(n) } ``` -- ```r show_cars("corolla") ``` ``` ## # A tibble: 5 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 toyota corolla 1.8 1999 4 auto(l3) f 24 30 r compa… ## 2 toyota corolla 1.8 1999 4 auto(l4) f 24 33 r compa… ## 3 toyota corolla 1.8 1999 4 manual(… f 26 35 r compa… ## 4 toyota corolla 1.8 2008 4 manual(… f 28 37 r compa… ## 5 toyota corolla 1.8 2008 4 auto(l4) f 26 35 r compa… ``` -- ```r # a vector of values cars_i_own <- c("corolla", "civic") ``` --- ## Ex 1: iterate the function over those values ```r # return a list map(cars_i_own, show_cars) ``` ``` ## [[1]] ## # A tibble: 5 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 toyota corolla 1.8 1999 4 auto(l3) f 24 30 r compa… ## 2 toyota corolla 1.8 1999 4 auto(l4) f 24 33 r compa… ## 3 toyota corolla 1.8 1999 4 manual(… f 26 35 r compa… ## 4 toyota corolla 1.8 2008 4 manual(… f 28 37 r compa… ## 5 toyota corolla 1.8 2008 4 auto(l4) f 26 35 r compa… ## ## [[2]] ## # A tibble: 6 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 honda civic 1.6 1999 4 manual(… f 28 33 r subcomp… ## 2 honda civic 1.6 1999 4 auto(l4) f 24 32 r subcomp… ## 3 honda civic 1.6 1999 4 manual(… f 25 32 r subcomp… ## 4 honda civic 1.6 1999 4 manual(… f 23 29 p subcomp… ## 5 honda civic 1.6 1999 4 auto(l4) f 24 32 r subcomp… ## 6 honda civic 1.8 2008 4 manual(… f 26 34 r subcomp… ``` --- ## Ex 1: iterate the function over those values .footnote[See also `map_chr()`, `map_dbl()`, `map_int()`, `map_lgl()`, etc.] ```r # return a tibble *map_dfr(cars_i_own, show_cars) ``` ``` ## # A tibble: 11 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 toyota corol… 1.8 1999 4 auto(l… f 24 30 r compact ## 2 toyota corol… 1.8 1999 4 auto(l… f 24 33 r compact ## 3 toyota corol… 1.8 1999 4 manual… f 26 35 r compact ## 4 toyota corol… 1.8 2008 4 manual… f 28 37 r compact ## 5 toyota corol… 1.8 2008 4 auto(l… f 26 35 r compact ## 6 honda civic 1.6 1999 4 manual… f 28 33 r subcom… ## 7 honda civic 1.6 1999 4 auto(l… f 24 32 r subcom… ## 8 honda civic 1.6 1999 4 manual… f 25 32 r subcom… ## 9 honda civic 1.6 1999 4 manual… f 23 29 p subcom… ## 10 honda civic 1.6 1999 4 auto(l… f 24 32 r subcom… ## 11 honda civic 1.8 2008 4 manual… f 26 34 r subcom… ``` --- ## `group_by()` + `summarize()` .center[![](../gfx/group_by.png)] --- ## `group_by()` + `group_modify()` .center[![](../gfx/do.png)] Goal: - start with a tibble - group it - apply a function that takes a tibble and returns a tibble - return a grouped tibble of the function **applied to each group** --- ## Ex 2: iterate over groups of a data frame ```r mtcars %>% # creates a grouped data frame group_by(cyl) %>% # returns a list of data frames * group_map(head, n = 2) ``` ``` ## [[1]] ## # A tibble: 2 x 10 ## mpg disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 22.8 108 93 3.85 2.32 18.6 1 1 4 1 ## 2 24.4 147. 62 3.69 3.19 20 1 0 4 2 ## ## [[2]] ## # A tibble: 2 x 10 ## mpg disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 21 160 110 3.9 2.62 16.5 0 1 4 4 ## 2 21 160 110 3.9 2.88 17.0 0 1 4 4 ## ## [[3]] ## # A tibble: 2 x 10 ## mpg disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 18.7 360 175 3.15 3.44 17.0 0 0 3 2 ## 2 14.3 360 245 3.21 3.57 15.8 0 0 3 4 ``` --- ## Ex 2: iterate over groups of a data frame ```r mtcars %>% # creates a grouped data frame group_by(cyl) %>% # returns a grouped data frame * group_modify(head, n = 2) ``` ``` ## # A tibble: 6 x 11 ## # Groups: cyl [3] ## cyl mpg disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 4 22.8 108 93 3.85 2.32 18.6 1 1 4 1 ## 2 4 24.4 147. 62 3.69 3.19 20 1 0 4 2 ## 3 6 21 160 110 3.9 2.62 16.5 0 1 4 4 ## 4 6 21 160 110 3.9 2.88 17.0 0 1 4 4 ## 5 8 18.7 360 175 3.15 3.44 17.0 0 0 3 2 ## 6 8 14.3 360 245 3.21 3.57 15.8 0 0 3 4 ``` --- ## Ex 3: regression over groups of a data frame ```r mtcars %>% # creates a grouped data frame group_by(cyl) %>% # returns a grouped data frame * group_modify(~broom::tidy(lm(mpg ~ disp, data = .x))) ``` ``` ## # A tibble: 6 x 6 ## # Groups: cyl [3] ## cyl term estimate std.error statistic p.value ## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 4 (Intercept) 40.9 3.59 11.4 0.00000120 ## 2 4 disp -0.135 0.0332 -4.07 0.00278 ## 3 6 (Intercept) 19.1 2.91 6.55 0.00124 ## 4 6 disp 0.00361 0.0156 0.232 0.826 ## 5 8 (Intercept) 22.0 3.35 6.59 0.0000259 ## 6 8 disp -0.0196 0.00932 -2.11 0.0568 ```