class: center, middle, inverse, title-slide # Database Querying with SQL ## IMDB ### Ben Baumer ### SDS 192April 20th, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_sql_03-imdb.html
) --- class: center, middle, inverse # IMDB --- ## Do we need the MySQL Workbench? - Short answer: No -- - Long answer: I find it helpful for: - exploring databases with a GUI - fiddling around with queries - more informative error messages - built-in help - adding keys and indexes (more on that later) - then copy-and-paste **working queries** into R Markdown --- ## Ways to interact with MySQL 1. GUI SQL client: MySQL Workbench (previous video) 2. command line SQL client (next slide) 3. generic SQL access in R: ~~`DBI::dbGetQuery()`~~ 4. `RMySQL` through R Markdown chunks (see labs & hw) 5. Lots of other ways... -- I prefer: - (1) for SQL development - (4) for SQL reporting --- ## Method 2: Using the terminal ```bash mysql -h scidb.smith.edu -u sds192 -pDSismfc@S ``` ```sql SHOW DATABASES; ``` ``` ## +--------------------+ ## | Database | ## +--------------------+ ## | information_schema | ## | airlines | ## | citibike | ## | fec | ## | imdb | ## | lahman | ## | nyctaxi | ## | yelp | ## +--------------------+ ``` --- ## Getting information about databases; ```sql USE imdb; SHOW TABLES; ``` ``` ## +-----------------+ ## | Tables_in_imdb | ## +-----------------+ ## | aka_name | ## | aka_title | ## | cast_info | ## | char_name | ## | comp_cast_type | ## | company_name | ## | company_type | ## | complete_cast | ## | info_type | ## | keyword | ## | kind_type | ## | link_type | ## | movie_companies | ## | movie_info | ## | movie_info_idx | ## | movie_keyword | ## | movie_link | ## | name | ## | person_info | ## | role_type | ## | title | ## +-----------------+ ``` --- ## Getting information about tables ```sql DESCRIBE movie_info; ``` ``` ## +--------------+---------+------+-----+---------+----------------+ ## | Field | Type | Null | Key | Default | Extra | ## +--------------+---------+------+-----+---------+----------------+ ## | id | int(11) | NO | PRI | NULL | auto_increment | ## | movie_id | int(11) | NO | MUL | NULL | | ## | info_type_id | int(11) | NO | MUL | NULL | | ## | info | text | NO | | NULL | | ## | note | text | YES | | NULL | | ## +--------------+---------+------+-----+---------+----------------+ ``` --- ## People in the IMDB .footnote[https://en.wikipedia.org/wiki/Primary_key] - `name`: people and their names - `id` column is **primary key** -- - `person_id` column references `name.id` from: - `person_info` - `cast_info` - `aka_name` --- ## Movies in the IMDB - `title`: movies and their titles - `id` column is primary key -- - `movie_id` column references `title.id` from: - `cast_info` - `aka_title` - `movie_companies` - `movie_info` - `movie_info_idx` - `movie_keyword` - `movie_link` --- background-image: url(https://i.imgur.com/pDq0n.png) background-size: contain .footnote[https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model] --- ## What's the deal with table aliases? Consider: ```sql SELECT title.title, company_name.name, title.production_year FROM movie_companies JOIN company_name ON movie_companies.company_id = company_name.id JOIN title ON title.id = movie_companies.movie_id WHERE company_name.name LIKE '%All Rise Films%' AND title.kind_id = 1 AND movie_companies.company_type_id = 2; ``` --- ## What's the deal with table aliases? Instead: ```sql SELECT t.title, cn.name, t.production_year *FROM movie_companies AS mc * JOIN company_name AS cn ON mc.company_id = cn.id JOIN title AS t ON t.id = mc.movie_id WHERE cn.name LIKE '%All Rise Films%' AND t.kind_id = 1 AND mc.company_type_id = 2; ``` --- ## Example: *Henry & June* Restrict to English-language films only ```sql SELECT t.title, t.production_year, mi.info AS rating, mi2.info AS language FROM title t JOIN movie_info AS mi ON mi.movie_id = t.id *JOIN movie_info AS mi2 * ON mi2.movie_id = t.id WHERE mi.info_type_id IN (67, 97) AND mi.info LIKE '%NC-17%' AND kind_id = 1 * AND mi2.info_type_id = 4 * AND mi2.info = 'English' ORDER BY production_year; ```