class: center, middle, inverse, title-slide # Mini-Lecture 21 ## MP2 presentations ### Ben Baumer ### SDS 192March 22, 2019(
http://beanumber.github.io/sds192/lectures/21-mp2_presentation.html
) --- background-image: url(../gfx/delcorazon_flyer.png) background-size: contain --- ## The SDS Major .pull-left[ ![](../gfx/sds_hex.png) ] .pull-right[ - SDS Presentation of the [Major](https://www.smith.edu/sds/major.php) Tues, Mar 26 12--1, Ford Atrium free food! - Learn about [courses for next year](https://www.smith.edu/sds/courses_F19S20.php)! ] --- ## GitHub trouble - If you're having trouble pushing/pulling: - resolve modified files by committing or reverting - If you have a merge conflict: - just open the file and manually fix it up - then commit, pull, and push --- class: center, middle # Exam prep --- ## Sample question - Why doesn't the following pipeline work? ```r mpg %>% group_by(year) %>% summarize(num_models = n(), model = model, avg_hwy = mean(hwy)) %>% filter(cty > 20) ``` ``` ## Error: Column `model` must be length 1 (a summary value), not 117 ``` -- [Respond now!](https://pollev.com/benbaumer295) --- <iframe src="https://embed.polleverywhere.com/multiple_choice_polls/NZiCoBTO88oqrfhGyaPC8?controls=none&short_poll=true" width="800" height="600" frameBorder="0"></iframe> --- ## Plagiarism > [Plagiarism](https://en.wikipedia.org/wiki/Plagiarism) is the "wrongful appropriation" and "stealing and publication" of another author's "language, thoughts, ideas, or expressions" and the representation of them as one's own original work. - [Coding](http://femgineer.com/2012/09/software-engineering-another-form-of-self-expression/) is a [form](https://dev.to/ajahso4/programming-as-a-form-of-expression-cg1) of [expression](https://www.huffingtonpost.com/idit-harel-caperton/putting-coding-on-a-par-w_b_5360503.html) - Please review Collaboration policy on [syllabus](https://beanumber.github.io/sds192/syllabus.html#policies) --- ## The passive voice > [Passive sentences](http://advice.writing.utoronto.ca/revising/passive-voice/) can get you into trouble [in academic writing] because they can be vague about who is responsible for the action -- - "it wouldn't work" -- - "it is not working" -- - "my RStudio does not work anymore" -- - "R doesn't like me" -- - New mantra: "I haven't yet figured out how to make it work" -- - Take ownership of your computer! - computer : data scientist :: instrument : musician --- class: center, middle # `babynames` --- ## Homework \#4 debrief - Let `\(n\)` be the number of years (e.g. 83) - Let `\(k\)` be the number of names (e.g. 35) - Question: How many numbers do you need? -- - to make the lines? -- - to make the dots? -- - Answer: `\(nk + k\)` --- ## Two approaches .pull-left[ Approach: \#1 - Write function to compute `\(n\)` rows for each name - And 1 additional row - Find `\(k\)` names - Iterate over `\(k\)` names with `map()` - Result: `\(nk + k\)` observations ] .pull-right[ Approach: \#2 - Find `\(k\)` names - Compute `tbl_df` of `\(nk\)` rows for lines - Compute `tbl_df` of `\(k\)` rows for points - Use `facet_wrap()` - Result: `\(nk + k\)` observations ] --- ## Sketch of my approach - Find `candidate_names` (about 1600) - Compute `candidate_data` relevant for those names - Compute `unisex_names` as top 35 in RMSE among those - Compute `unisex_data` by filtering `candidate_data` for `unisex_names` - Compute `most_unisex_year_names` by `map()`-ing over `unisex_data` ```r ggplot(unisex_data, aes(x = year, y = pct_girls)) + geom_line() + geom_area(fill = "#eaac9e") + geom_point(data = most_unisex_year_names, color = "white") + facet_wrap(~name) + theme(...) ``` --- ## Takeaways from this module - Four **huge** ideas - Five verbs -> pipelines - Joining relational data tables - Tidy data - Iterating (user-defined) functions - Scalability is important! - If you can do an analysis for one thing... - ...can you do it for the whole class of things? --- class: center, middle # MP2 --- # MP2 Presentations 1. Self-organize into groups of 3 teams 2. Find a spot --- ## Peer-review - at most 10 minutes per team - "Tell a story---not necessarily the whole story..." - Fill out peer-review form: - https://docs.google.com/forms/d/1bUcYxnrSs8A2rlhTD6f1_fs_2UIrLyRpu15tI8emYMc/ - one review per team per presentation - reconvene around 11:45 --- ## Work on... - [Mini-project \#2](../mod_data.html) - Write-ups due by 11:55 pm on Sunday night - **TAKE THE EXAM!!**