In this lab, we will learn how to import shapefiles into R.
library(tidyverse)Goal: by the end of this lab, you will be able to import arbitrary shapefiles into R.
GIS data in the form of shapefiles is all over the web. Government agencies are particularly good sources for these. Here, we illustrate how to find bicycling trails in Massachusetts from MassGIS.
url <- "http://download.massgis.digital.mass.gov/shapefiles/state/biketrails_arc.zip"
local_file <- basename(url)
download.file(url, destfile = local_file)
unzip(local_file, exdir = "biketrails")Once we have the files unzipped, we can explore what layers are available in that folder. The sf package does this nicely.
library(sf)
dsn <- path.expand("biketrails/biketrails_arc")
dsn
list.files(dsn)There many files in this directory, each containing different spatial information. But how many coherent layers are there? In this case, just one. What does that layer contain?
st_layers(dsn)Looks good, let’s read it into R.
bike_trails <- read_sf(dsn)
bike_trailsThis is a collection of MULTILINESTRING geometries with many attributes. What does it look like?
bike_trails %>%
select(TRAILNAME) %>%
plot()This is somewhat evocative of the state of Massachusetts…
Note the projection string:
st_crs(bike_trails)N.B. - The
mass_gis()function in themacleishpackage can do some of this work for you, if you only care about intersections with the MacLeish property boundary.
MULTILINESTRING and not POLYGON?Since bike_trails is a data.frame – in addition to being an sf object – you can wrangle the data just like you would in any other data.frame. dplyr functions like select(), mutate(), filter(), arrange(), group_by(), and summarize() work just as you might imagine they would.
bike_trails %>%
filter(OWNER == "MBTA")# SAMPLE SOLUTION
nrow(bike_trails)# SAMPLE SOLUTION
bike_trails %>%
arrange(desc(SHAPE_LEN)) %>%
head(1)# SAMPLE SOLUTION
bike_trails %>%
filter(str_detect(TRAILNAME, "Norwottuck Rail Trail"))# SAMPLE SOLUTION
bike_trails %>%
group_by(TRAILNAME) %>%
summarize(
num_segments = n(),
total_length = sum(SHAPE_LEN)
) %>%
arrange(desc(total_length))Note that summarize() will aggregate the geometries using st_union(). Thus, while the following two pipelines will produce similar looking plots, the first one consists of 33 rows, each consisting of a single trail segment, while the second one consists of just one row, which contains a MULTILINESTRING geometry that has the union of the information for all 33 trail segments.
# 33 rows
bike_trails %>%
filter(OWNER == "MBTA")
# one row
bike_trails %>%
filter(OWNER == "MBTA") %>%
group_by(OWNER) %>%
summarize(
num_segments = n(),
total_length = sum(SHAPE_LEN)
) %>%
arrange(desc(total_length))The bike trails are in a Lambert conformal conic projection. Note that the units of the coordinates are very different from lat/long.
st_bbox(bike_trails)In order to get these data onto our leaflet map, we need to re-project them. The leaflet data are projected in EPSG:4326 (Web Mercator)—and we can’t change that. So we’ll have to convert the bike trails to EPSG:4326. This is done with the st_transform() function.
bike_trails_4326 <- bike_trails %>%
st_transform(4326)
bike_trails_4326Note that the projection string and the coordinates are very different now.
Now we can put this on our map.
library(leaflet)
leaflet() %>%
addTiles() %>%
addPolylines(data = bike_trails_4326)Please consult the Leaflet documentation for more details and examples.
# SAMPLE SOLUTION
pal <- colorNumeric(
palette = "viridis",
domain = bike_trails_4326$SHAPE_LEN
)
leaflet(data = bike_trails_4326) %>%
addTiles() %>%
addPolylines(color = ~pal(SHAPE_LEN)) %>%
addLegend("bottomright", pal = pal, values = ~SHAPE_LEN)Please respond to the following prompt on Slack in the #mod-spatial channel.
Prompt: What questions to you still have about working with shapefiles in
sf?