In this lab, we will learn how to import shapefiles into R.

library(tidyverse)

Goal: by the end of this lab, you will be able to import arbitrary shapefiles into R.

Importing GIS data

GIS data in the form of shapefiles is all over the web. Government agencies are particularly good sources for these. Here, we illustrate how to find bicycling trails in Massachusetts from MassGIS.

url <- "http://download.massgis.digital.mass.gov/shapefiles/state/biketrails_arc.zip"
local_file <- basename(url)
download.file(url, destfile = local_file)
unzip(local_file, exdir = "biketrails")

Once we have the files unzipped, we can explore what layers are available in that folder. The sf package does this nicely.

library(sf)
dsn <- path.expand("biketrails/biketrails_arc")
dsn
list.files(dsn)

There many files in this directory, each containing different spatial information. But how many coherent layers are there? In this case, just one. What does that layer contain?

st_layers(dsn)

Looks good, let’s read it into R.

bike_trails <- read_sf(dsn)
bike_trails

This is a collection of MULTILINESTRING geometries with many attributes. What does it look like?

bike_trails %>%
  select(TRAILNAME) %>%
  plot()

This is somewhat evocative of the state of Massachusetts…

Note the projection string:

st_crs(bike_trails)

N.B. - The mass_gis() function in the macleish package can do some of this work for you, if you only care about intersections with the MacLeish property boundary.

  1. Why are the bike trails geometries MULTILINESTRING and not POLYGON?

Basic data wrangling

Since bike_trails is a data.frame – in addition to being an sf object – you can wrangle the data just like you would in any other data.frame. dplyr functions like select(), mutate(), filter(), arrange(), group_by(), and summarize() work just as you might imagine they would.

bike_trails %>%
  filter(OWNER == "MBTA")
  1. How many distinct bike trail segments are there?
# SAMPLE SOLUTION
nrow(bike_trails)
  1. What is the longest individual bike trail segment?
# SAMPLE SOLUTION
bike_trails %>%
  arrange(desc(SHAPE_LEN)) %>%
  head(1)
  1. How many segments are associated with the Norwottuck Rail Trail?
# SAMPLE SOLUTION
bike_trails %>%
  filter(str_detect(TRAILNAME, "Norwottuck Rail Trail"))
  1. Among all of the named trails (which may have multiple features), which one has the longest total length?
# SAMPLE SOLUTION
bike_trails %>%
  group_by(TRAILNAME) %>%
  summarize(
    num_segments = n(), 
    total_length = sum(SHAPE_LEN)
  ) %>%
  arrange(desc(total_length))

Note that summarize() will aggregate the geometries using st_union(). Thus, while the following two pipelines will produce similar looking plots, the first one consists of 33 rows, each consisting of a single trail segment, while the second one consists of just one row, which contains a MULTILINESTRING geometry that has the union of the information for all 33 trail segments.

# 33 rows
bike_trails %>%
  filter(OWNER == "MBTA")

# one row
bike_trails %>%
  filter(OWNER == "MBTA") %>%
  group_by(OWNER) %>%
  summarize(
    num_segments = n(), 
    total_length = sum(SHAPE_LEN)
  ) %>%
  arrange(desc(total_length))

Projections

The bike trails are in a Lambert conformal conic projection. Note that the units of the coordinates are very different from lat/long.

st_bbox(bike_trails)

In order to get these data onto our leaflet map, we need to re-project them. The leaflet data are projected in EPSG:4326 (Web Mercator)—and we can’t change that. So we’ll have to convert the bike trails to EPSG:4326. This is done with the st_transform() function.

bike_trails_4326 <- bike_trails %>%
  st_transform(4326)
bike_trails_4326

Note that the projection string and the coordinates are very different now.

Now we can put this on our map.

library(leaflet)
leaflet() %>%
  addTiles() %>%
  addPolylines(data = bike_trails_4326)

Please consult the Leaflet documentation for more details and examples.

  1. Color-code the bike trails based on their length, and add an informative legend to the plot.
# SAMPLE SOLUTION

pal <- colorNumeric(
  palette = "viridis",
  domain = bike_trails_4326$SHAPE_LEN
)

leaflet(data = bike_trails_4326) %>%
  addTiles() %>%
  addPolylines(color = ~pal(SHAPE_LEN)) %>%
  addLegend("bottomright", pal = pal, values = ~SHAPE_LEN)

Your learning

Please respond to the following prompt on Slack in the #mod-spatial channel.

Prompt: What questions to you still have about working with shapefiles in sf?