class: center, middle, inverse, title-slide # Data wrangling ## Joining two tables ### Ben Baumer ### SDS 192Oct 2, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_wrangling_05-join.html
) --- # Relational data .footnote[http://r4ds.had.co.nz/relational-data.html] - Data from two or more tables that is *related* - Need principled way to combine -- - Soon: - Tidy data - 3rd normal form -- - Later: - SQL - Relational Database Management Systems (RDBMS) - Keys and indices --- # Example: `nycflights13` ```r library(nycflights13) ``` - `flights` - `airports` - `airlines` - `planes` - `weather` --- # Keep like with like ```r flights %>% select(year, month, day, flight, origin, dest, carrier) %>% head(3) ``` ``` ## # A tibble: 3 x 7 ## year month day flight origin dest carrier ## <int> <int> <int> <int> <chr> <chr> <chr> ## 1 2013 1 1 1545 EWR IAH UA ## 2 2013 1 1 1714 LGA IAH UA ## 3 2013 1 1 1141 JFK MIA AA ``` -- ```r head(airlines, 3) ``` ``` ## # A tibble: 3 x 2 ## carrier name ## <chr> <chr> ## 1 9E Endeavor Air Inc. ## 2 AA American Airlines Inc. ## 3 AS Alaska Airlines Inc. ``` --- # Connections between tables ([ER model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model)) ![](http://r4ds.had.co.nz/diagrams/relational-nycflights.png) --- # Be specific ```r flights %>% inner_join(airlines) %>% dim() ``` ``` ## Joining, by = "carrier" ``` ``` ## [1] 336776 20 ``` -- ```r flights %>% inner_join(airlines, by = "carrier") %>% dim() ``` ``` ## [1] 336776 20 ``` -- ```r flights %>% inner_join(airlines, by = c("carrier" = "carrier")) %>% dim() ``` ``` ## [1] 336776 20 ``` --- # A toy example .center[![](http://r4ds.had.co.nz/diagrams/join-setup.png)] --- # Inner join .center[![](http://r4ds.had.co.nz/diagrams/join-inner.png)] --- # Left join .center[![](../gfx/join_inner.png)] --- # Right join .center[![](../gfx/join_right.png)] --- # Full join .center[![](../gfx/join_full.png)] --- # Alternatively, but... .center[![](http://r4ds.had.co.nz/diagrams/join-venn.png)] - This doesn't capture the many-to-oneness --- # Many-to-one .center[![](http://r4ds.had.co.nz/diagrams/join-one-to-many.png)]