class: center, middle, inverse, title-slide # Mini-Lecture 11 ## Data transformation verbs ### Ben Baumer ### SDS 192Feb 19, 2020(
http://beanumber.github.io/sds192/lectures/11-dplyr.html
) --- background-image: url(https://static01.nyt.com/images/2019/02/17/magazine/17mag-coders-pics-slide-ELU0/17mag-coders-pics-slide-ELU0-superJumbo.png?quality=90&auto=webp) background-size: contain background-position: 100% 0% ## An article .pull-left[ - The Secret History of Women in Coding > Computer programming once had much better gender balance than it does today. What went wrong? .footnote[https://www.nytimes.com/2019/02/13/magazine/women-coding-computer-programming.html] ] --- ## Announcements - Lab grades are updated - HW #1 mostly graded - Still working through HW #2 grades - Working through MP1 submissions -- - **Check your grades on Moodle** and DM me if you have an issue! --- class: center, middle, inverse # ![](https://raw.githubusercontent.com/tidyverse/dplyr/master/man/figures/logo.png) --- ## `dplyr` highlights .pull-left[ The Five Verbs - `select()` - `filter()` - `mutate()` - `arrange()` - `summarize()` ] .pull-right[ Plus: - `group_by()` - `rename()` - `inner_join()` - `left_join()` ] --- ## Philosophy - Each *verb* takes a data frame and returns a data frame - actually a `tbl_df` (more on that later) - allows chaining with `%>%` (more on that later) - Idea: - master a few simple commands - use your creativity to combine them - Cheat Sheet: - (https://www.rstudio.com/resources/cheatsheets/) - vote for hard copies on Slack --- background-image: url("../gfx/dplyr_cheatsheet.png") background-size: contain --- ## What is a tibble? .pull-left[ .center[![](http://hexb.in/hexagons/tibble.png)] ] .pull-right[ - object of class `tbl` - a re-imagining of a `data.frame` - it looks and acts like a `data.frame` - but it's even better... - `tidyverse` works with tibbles ] --- ## `select()`: take a subset of the **columns** ![](../gfx/select.png) --- ## `filter()`: take a subset of the **rows** ![](../gfx/filter.png) --- ## `mutate()`: add or modify a **column** ![](../gfx/mutate.png) --- ## `arrange()`: sort the **rows** ![](../gfx/arrange.png) --- ## `summarize()`: collapse to **a single row** ![](../gfx/summarise.png) --- class: center, middle, inverse # The pipe --- ## The pipe operator .pull-left[ .center[![](http://hexb.in/hexagons/magrittr.png)] - Inspired by pipe (`|`) in UNIX - Provided by `magrittr` package ] -- .pull-right[ ![](https://upload.wikimedia.org/wikipedia/en/b/b9/MagrittePipe.jpg) - [The Treachery of Images](https://en.wikipedia.org/wiki/The_Treachery_of_Images) - Rene Magritte, 1929 ] --- ## How does the pipe work? ![](../gfx/tidy-pipe.png) --- ## Using the pipe The expression ```r mydata %>% verb(arguments) ``` is the same as: ```r verb(mydata, arguments) ``` -- Thus, ```r function(x, args) ``` -- has the same effect as ```r x %>% function(args) ``` --- ## Why the pipe? Instead of having to read/write: ```r select(filter(mutate(data, args1), args2), args3) ``` You can write: ```r data %>% mutate(args1) %>% filter(args2) %>% select(args3) ``` --- ## Little Bunny Foo Foo <iframe width="640" height="360" src="https://www.youtube.com/embed/R6xKM-H2awE?ecver=1" frameborder="0" allowfullscreen></iframe> --- ## Coding Little Bunny Foo Foo - Nested form: ```r bop(scoop(hop(foo_foo, through = forest), up = field_mice), on = head) ``` - With pipes: ```r foo_foo %>% hop(through = forest) %>% scoop(up = field_mouse) %>% bop(on = head) ``` .footnote[https://github.com/hadley/r4ds/blob/master/pipes.Rmd] --- class: center, middle, inverse # Coding style --- ## Coding style ![](https://imgs.xkcd.com/comics/code_quality.png) .footnote[https://xkcd.com/1513/] --- ## Literate Programming .pull-left[ ![](https://upload.wikimedia.org/wikipedia/en/6/62/Literate_Programming_book_cover.jpg) ] .pull-right[ - [Don Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) (1985) - code and natural language interspersed - we implement this in R Markdown ] --- ## Hadley's style guide [![style](../gfx/style.png)](http://adv-r.had.co.nz/Style.html) --- ## [Who is Hadley?](https://en.wikipedia.org/wiki/Hadley_Wickham) .pull-left[ ![](https://pix-media.priceonomics-media.com/blog/1001/HadleyObama.png) ] .pull-right[ - from New Zealand (like R) - Ph.D. statistics from Iowa St. (2008) - Chief Scientist at RStudio - author of many R packages, including... ] --- ## `tidyverse` .pull-left[ [![](https://pbs.twimg.com/media/CvzEQcfWIAAIs-N.jpg)](http://tidyverse.org) ] .pull-right[ - collection of packages that all fit together ] --- class: inverse # Work on... 1. Lab 6: - (http://beanumber.github.io/sds192/lab-single_table.html) 2. HW #3: - by Tue, Feb 25th 3. Next up: `group_by()`...