class: center, middle, inverse, title-slide # Data wrangling ## Joining two tables ### Ben Baumer ### SDS 192Oct 2, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_wrangling_05-join.html
) --- # Relational data .footnote[http://r4ds.had.co.nz/relational-data.html] - Data from two or more tables that is *related* - Need principled way to combine -- - Soon: - Tidy data - 3rd normal form -- - Later: - SQL - Relational Database Management Systems (RDBMS) - Keys and indices --- # Example: `nycflights13` ```r library(nycflights13) ``` - `flights` - `airports` - `airlines` - `planes` - `weather` --- # Keep like with like ```r flights %>% select(year, month, day, flight, origin, dest, carrier) %>% head(3) ``` ``` ## # A tibble: 3 x 7 ## year month day flight origin dest carrier ## <int> <int> <int> <int> <chr> <chr> <chr> ## 1 2013 1 1 1545 EWR IAH UA ## 2 2013 1 1 1714 LGA IAH UA ## 3 2013 1 1 1141 JFK MIA AA ``` -- ```r head(airlines, 3) ``` ``` ## # A tibble: 3 x 2 ## carrier name ## <chr> <chr> ## 1 9E Endeavor Air Inc. ## 2 AA American Airlines Inc. ## 3 AS Alaska Airlines Inc. ``` --- # Connections between tables ([ER model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model)) data:image/s3,"s3://crabby-images/de811/de811444f2b894de557094ba83b13bafd15b8809" alt="" --- # Be specific ```r flights %>% inner_join(airlines) %>% dim() ``` ``` ## Joining, by = "carrier" ``` ``` ## [1] 336776 20 ``` -- ```r flights %>% inner_join(airlines, by = "carrier") %>% dim() ``` ``` ## [1] 336776 20 ``` -- ```r flights %>% inner_join(airlines, by = c("carrier" = "carrier")) %>% dim() ``` ``` ## [1] 336776 20 ``` --- # A toy example .center[data:image/s3,"s3://crabby-images/07380/07380393100e9007c46b24974fb0a8ee2e1491fa" alt=""] --- # Inner join .center[data:image/s3,"s3://crabby-images/da74a/da74a1f5b3e6e1525f8d3b607a22b4693a9dbdce" alt=""] --- # Left join .center[data:image/s3,"s3://crabby-images/ca2c8/ca2c8b0d8938c7e09b21a63169a81661f200c4e3" alt=""] --- # Right join .center[data:image/s3,"s3://crabby-images/34435/34435d48fe4ff35a7bbe5b83d04f6845b6c75968" alt=""] --- # Full join .center[data:image/s3,"s3://crabby-images/c6195/c6195af94babd700accbee0f257ea893a6a4c8ea" alt=""] --- # Alternatively, but... .center[data:image/s3,"s3://crabby-images/1fd83/1fd83a64509db71f38deafed8211085ace8fffdf" alt=""] - This doesn't capture the many-to-oneness --- # Many-to-one .center[data:image/s3,"s3://crabby-images/96ff3/96ff385587f4f4caaf0af08cce596bb4dfb3a6fa" alt=""]