class: center, middle, inverse, title-slide # Data wrangling ## Aggregation ### Ben Baumer ### SDS 192Sep 30, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_wrangling_04-group_by.html
) --- class: center, middle, inverse ![](https://raw.githubusercontent.com/tidyverse/dplyr/master/man/figures/logo.png) --- ## The Five Verbs .pull-left[ The Five Verbs - `select()` - `filter()` - `mutate()` - `arrange()` - `summarize()` ] .pull-right[ Plus: - **`group_by()`** - `rename()` - `inner_join()` - `left_join()` ] --- ## `summarize()`: collapse into **a single row** .center[![](../gfx/summarize.png)] --- ## `group_by()`: group by a variable .center[![](../gfx/group_by.png)] --- background-image: url("../gfx/summarize_funs.png") background-size: contain background-position: 100% 0% ## Summary funs .pull-left[ - take a **vector** - output a single value ] --- ## Example: average fuel economy ```r mtcars %>% group_by(cyl) %>% summarize( num_cars = n(), avg_mpg = mean(mpg) ) ``` ``` ## `summarise()` ungrouping output (override with `.groups` argument) ``` ``` ## # A tibble: 3 x 3 ## cyl num_cars avg_mpg ## <dbl> <int> <dbl> ## 1 4 11 26.7 ## 2 6 7 19.7 ## 3 8 14 15.1 ```