class: center, middle, inverse, title-slide # Data wrangling ## Data transformation verbs ### Ben Baumer ### SDS 192Sep 28, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_wrangling_01-dplyr.html
) --- class: center, middle, inverse # ![](https://raw.githubusercontent.com/tidyverse/dplyr/master/man/figures/logo.png) --- ## `dplyr` highlights .footnote[https://r4ds.had.co.nz/transform.html] .pull-left[ The Five Verbs: - `select()` - `filter()` - `mutate()` - `arrange()` - `summarize()` ] -- .pull-right[ Plus: - `group_by()` - `rename()` - `inner_join()` - `left_join()` ] --- ## Philosophy - Each *verb* takes a data frame and returns a data frame - actually a `tbl_df` (more on that later) - allows chaining with `%>%` (more on that later) - Idea: - master a few simple commands - use your creativity to combine them - Cheat Sheet: - https://www.rstudio.com/resources/cheatsheets/ --- background-image: url("../gfx/dplyr_cheatsheet.png") background-size: contain --- ## What is a tibble? .footnote[https://r4ds.had.co.nz/tibbles.html] .pull-left[ .center[![](http://hexb.in/hexagons/tibble.png)] ] .pull-right[ - object of class `tbl` - a re-imagining of a `data.frame` - it looks and acts like a `data.frame` - but it's even better... - `tidyverse` works with tibbles ] --- ## `select()`: take a subset of the **columns** ![](../gfx/select.png) --- ## `filter()`: take a subset of the **rows** ![](../gfx/filter.png) --- ## `mutate()`: add or modify a **column** ![](../gfx/mutate.png) --- ## `arrange()`: sort the **rows** ![](../gfx/arrange.png) --- ## `summarize()`: collapse to **a single row** ![](../gfx/summarise.png)