class: center, middle, inverse, title-slide # Database querying with SQL ## Introduction to SQL ### Ben Baumer ### SDS 192April 17, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_sql_01-intro.html
) --- class: center, middle, inverse # SQL --- ## A brief primer on SQL .footnote[https://en.wikipedia.org/wiki/SQL] .pull-left[  ] .pull-right[ - SQL is not just one thing - MySQL, PostgreSQL, SQLite - Oracle, Big Query, Vertica - Theory developed in 1970s (E.F. Codd) - Ingres implemented mid-1970s - robust, time-tested, well understood ] --- ## We'll be using MySQL .footnote[https://en.wikipedia.org/wiki/MySQL] .pull-left[  ] .pull-right[ - Arguably the most popular RDBMS - open source - See also: [PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL), [SQLite](https://en.wikipedia.org/wiki/SQLite) - bought by Sun in 2008, now owned by Oracle - [MariaDB](https://en.wikipedia.org/wiki/MariaDB) is GPL-only fork ] --- ## The thing about SQL is... -- .pull-left[ Since you already know .centered[] ] -- .pull-right[ You **already** know SQL!  ] --- ## Same concepts, different syntax .pull-left[ dplyr ```r table %>% filter(field == "value") %>% left_join(lkup, by = c("lkup_id" = "id") %>% group_by(year) %>% summarize(N = sum(1)) %>% filter(N > 100) %>% arrange(desc(N)) %>% head(10) ``` ] .pull-right[ MySQL ```sql SELECT year, sum(1) as N FROM table t LEFT JOIN lkup l ON t.lkup_id = l.id WHERE field = "value" GROUP BY year HAVING N > 100 ORDER BY N desc LIMIT 0, 10; ``` ] -- - You may have learned `dplyr` first, but... - SQL came first - and it's more widely-used --- ## Why `dplyr` vs. SQL? .pull-left[ R + `dplyr` good at: - fitting models - plotting - wrangling data of all kinds - working with small data - being an **interface to SQL** ] .pull-right[ SQL good at: - storage and retrieval - medium-to-big data - multi-user, asynchronous access - serving institutional needs - web/mobile apps ]