class: center, middle, inverse, title-slide # Programming with data ## Importing data ### Ben Baumer ### SDS 192October 23, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_scale_03-import.html
) --- ## Data from packages - Use `data()` (if necessary) - Lazy loading - Watch out for different data types! - Check with `class()` ```r class(Titanic) ``` ``` ## [1] "table" ``` -- ```r class(as_tibble(Titanic)) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` --- ## Data from CSVs .footnote[https://en.wikipedia.org/wiki/Comma-separated_values] .pull-left[ - Use `read_csv()` (instead of `read.csv()`) - faster - doesn't convert characters to factors - part of `tidyverse` - CSVs - simple, easy, common format - not space efficient ] .pull-right[ ![](https://media.giphy.com/media/xThtade7NZxGhfzZyo/giphy.gif) ] --- ## Paths - Know your working directory! ```r getwd() ``` ``` ## [1] "/home/bbaumer/Dropbox/git/sds192/www/lectures" ``` ```r normalizePath(".") ``` ``` ## [1] "/home/bbaumer/Dropbox/git/sds192/www/lectures" ``` ```r normalizePath("..") ``` ``` ## [1] "/home/bbaumer/Dropbox/git/sds192/www" ``` ```r normalizePath("~") ``` ``` ## [1] "/home/bbaumer" ``` .footnote[https://en.wikipedia.org/wiki/Path_(computing)] --- ## Find your way around ```r list.files("..", pattern = "data") ``` ``` ## [1] "data" "mod_data.Rmd" ``` ```r list.files("../data") ``` ``` ## [1] "landdata-states.csv" "presapproval.csv" ``` --- ## Relative paths ```r approval <- read_csv("presapproval.csv") ``` ``` ## Error: 'presapproval.csv' does not exist in current working directory ('/home/bbaumer/Dropbox/git/sds192/www/lectures'). ``` ```r approval <- read_csv("data/presapproval.csv") ``` ``` ## Error: 'data/presapproval.csv' does not exist in current working directory ('/home/bbaumer/Dropbox/git/sds192/www/lectures'). ``` ```r approval <- read_csv("../data/presapproval.csv") ``` ``` ## ## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## cols( ## Time = col_character(), ## Republican = col_double(), ## Independent = col_double(), ## Democrat = col_double() ## ) ``` --- ## Avoid absolute paths ```r approval <- read_csv("~/Dropbox/git/sds192/www/data/presapproval.csv") ``` ``` ## ## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## cols( ## Time = col_character(), ## Republican = col_double(), ## Independent = col_double(), ## Democrat = col_double() ## ) ``` ```r approval <- read_csv("/home/bbaumer/Dropbox/git/sds192/www/data/presapproval.csv") ``` ``` ## ## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## cols( ## Time = col_character(), ## Republican = col_double(), ## Independent = col_double(), ## Democrat = col_double() ## ) ``` --- ## Use **here** package .footnote[https://github.com/r-lib/here] .pull-left[ ```r approval <- read_csv( here::here( "www/data/presapproval.csv" ) ) ``` ``` ## ## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## cols( ## Time = col_character(), ## Republican = col_double(), ## Independent = col_double(), ## Democrat = col_double() ## ) ``` ] .pull-right[ ![](https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/here.png) ] --- ## Data from Excel .pull-left[ - Use `readxl::read_excel()` - But [don't](https://blog.hubspot.com/sales/excel-killing-your-business) store your [data](https://footprinter.com/blog/item/c1fac65373f35715ead7ced9aff7ea8c) in [Excel](https://www.workwisellc.com/shouldnt-use-excel-database/)! ] .pull-right[ ![](https://media.giphy.com/media/aJPmOJi5bL0ic/giphy.gif) ] --- ## Data from Google Sheets - Use `googlesheets4` package - A bit more complicated due to permissions --- ## Data from other formats .pull-left[ - [`haven`](https://github.com/tidyverse/haven): read from SPSS, Stata, SAS, etc. - [`jsonlite`](https://github.com/jeroen/jsonlite): read/write JSON - [`xml2`](https://github.com/r-lib/xml2): read/write XML - Google or ask me for others! ] .pull-right[ ![](http://www.xml-buddy.com/images/convertcsv_xmljson_html_bw.png) ] --- ## Saving and loading data - Use `saveRDS()` to write R data objects to file ```r saveRDS(mtcars, file = "mtcars.rds") ``` - Use `readRDS()` to read those files back into R ```r readRDS(file = "mtcars.rds") ``` - Convenient for saving compiling time - But be vigilant about reproducibility!