R Package Data
In this lab, we will learn how to put data in an R package.
Goal: by the end of this lab, you should be able to put data in an R package.
Creating a package
Please see the R Package lab for help setting up an R package. In this lab we will add data to that R package.
Adding data
In this lab we will add the NYC Italian restaurants data set to our package. A CSV of the data is located at:
http://gattonweb.uky.edu/sheather/book/docs/datasets/nyc.csv
Our goal here is to preserve the origin story of the data in our package.
Use the
use_data_raw()function fromusethisto add a new script for processing the raw data. Note that the first argument to this function is the name of the data set. Call ititalian.Write the
data-raw/italian.Rscript. Usereadr::read_csv()to read the data directly from the URL. Confirm that the data looks good, and perform any additional data cleaning you want. A good idea would be to convert the variable names to snake_case. [You can use thejanitor::clean_names()function for this.]
The data-raw folder is the place where we keep the raw data and (perhaps more importantly) the script that we wrote to get that raw data. However, the data-raw folder is not part of the package, because it is ignored in .Rbuildignore.
- Examine the contents of
.Rbuildignoreand find the line that ignoresdata-raw.
Note, however, that data-raw is part of the repository, because we need to keep track of these files!
- Examine the contents of
.gitignoreand note thatdata-rawis not ignored by any of the lines.
In order to get the data bundled with the package, we have to put it in data. To do this, use the use_data() function.
- Make sure the last line of the script in
data-rawinvokesuse_data(). Run it!
If you did this correctly you should now see a file called italian.rda in the data folder.
Clear your workspace and rebuild and reinstall your package. Confirm that you can run
italian.Increment the version number of your package.
Run
R CMD check. Read about any warnings, errors, or notes.
Documenting data
Data sets need documentation just as much as functions. However, documenting a data set is different than documenting a function.
Create a new file called “data.R” in the R folder (if there isn’t one already).
Write
"italian"in that file.Use the
roxygentag@docType data. Rebuild the package.Run
?italianand read your documentation. Flesh it out by adding important details.
Engagement
Prompt: Where did you get stuck in this lab? What specific steps could use further explanation?