class: center, middle, inverse, title-slide # Data wrangling ## Data transformation verbs ### Ben Baumer ### SDS 192Sep 28, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_wrangling_01-dplyr.html
) --- class: center, middle, inverse # data:image/s3,"s3://crabby-images/d44b3/d44b363c0b7328b9de83f3de86cb5b836686a1d5" alt="" --- ## `dplyr` highlights .footnote[https://r4ds.had.co.nz/transform.html] .pull-left[ The Five Verbs: - `select()` - `filter()` - `mutate()` - `arrange()` - `summarize()` ] -- .pull-right[ Plus: - `group_by()` - `rename()` - `inner_join()` - `left_join()` ] --- ## Philosophy - Each *verb* takes a data frame and returns a data frame - actually a `tbl_df` (more on that later) - allows chaining with `%>%` (more on that later) - Idea: - master a few simple commands - use your creativity to combine them - Cheat Sheet: - https://www.rstudio.com/resources/cheatsheets/ --- background-image: url("../gfx/dplyr_cheatsheet.png") background-size: contain --- ## What is a tibble? .footnote[https://r4ds.had.co.nz/tibbles.html] .pull-left[ .center[data:image/s3,"s3://crabby-images/83cd3/83cd3e50573cb18af2e9a68543fffbcd1d4af55b" alt=""] ] .pull-right[ - object of class `tbl` - a re-imagining of a `data.frame` - it looks and acts like a `data.frame` - but it's even better... - `tidyverse` works with tibbles ] --- ## `select()`: take a subset of the **columns** data:image/s3,"s3://crabby-images/12a2f/12a2f5d283790186a4e2a7cf0621b15f0c8f6a23" alt="" --- ## `filter()`: take a subset of the **rows** data:image/s3,"s3://crabby-images/bd299/bd2993a9ea93b72e1111830e304c97eb886a4fe5" alt="" --- ## `mutate()`: add or modify a **column** data:image/s3,"s3://crabby-images/53baf/53baf79b54e225ffe5235382c221ed6520c684db" alt="" --- ## `arrange()`: sort the **rows** data:image/s3,"s3://crabby-images/a97a4/a97a45debcb6d9772f95faa69ec6cb1a9f672ac8" alt="" --- ## `summarize()`: collapse to **a single row** data:image/s3,"s3://crabby-images/a331c/a331ccc6574a763b941e7f9e74a6b295afbfecf0" alt=""