In this lab, we will learn how to import shapefiles into R.
library(tidyverse)
Goal: by the end of this lab, you will be able to import arbitrary shapefiles into R.
GIS data in the form of shapefiles is all over the web. Government agencies are particularly good sources for these. Here, we illustrate how to find bicycling trails in Massachusetts from MassGIS.
<- "http://download.massgis.digital.mass.gov/shapefiles/state/biketrails_arc.zip"
url <- basename(url)
local_file download.file(url, destfile = local_file)
unzip(local_file, exdir = "biketrails")
Once we have the files unzipped, we can explore what layers are available in that folder. The sf
package does this nicely.
library(sf)
<- path.expand("biketrails/biketrails_arc")
dsn
dsnlist.files(dsn)
There many files in this directory, each containing different spatial information. But how many coherent layers are there? In this case, just one. What does that layer contain?
st_layers(dsn)
Looks good, let’s read it into R.
<- read_sf(dsn)
bike_trails bike_trails
This is a collection of MULTILINESTRING
geometries with many attributes. What does it look like?
%>%
bike_trails select(TRAILNAME) %>%
plot()
This is somewhat evocative of the state of Massachusetts…
Note the projection string:
st_crs(bike_trails)
N.B. - The
mass_gis()
function in themacleish
package can do some of this work for you, if you only care about intersections with the MacLeish property boundary.
MULTILINESTRING
and not POLYGON
?Since bike_trails
is a data.frame
– in addition to being an sf
object – you can wrangle the data just like you would in any other data.frame
. dplyr
functions like select()
, mutate()
, filter()
, arrange()
, group_by()
, and summarize()
work just as you might imagine they would.
%>%
bike_trails filter(OWNER == "MBTA")
# SAMPLE SOLUTION
nrow(bike_trails)
# SAMPLE SOLUTION
%>%
bike_trails arrange(desc(SHAPE_LEN)) %>%
head(1)
# SAMPLE SOLUTION
%>%
bike_trails filter(str_detect(TRAILNAME, "Norwottuck Rail Trail"))
# SAMPLE SOLUTION
%>%
bike_trails group_by(TRAILNAME) %>%
summarize(
num_segments = n(),
total_length = sum(SHAPE_LEN)
%>%
) arrange(desc(total_length))
Note that summarize()
will aggregate the geometries using st_union()
. Thus, while the following two pipelines will produce similar looking plots, the first one consists of 33 rows, each consisting of a single trail segment, while the second one consists of just one row, which contains a MULTILINESTRING
geometry that has the union of the information for all 33 trail segments.
# 33 rows
%>%
bike_trails filter(OWNER == "MBTA")
# one row
%>%
bike_trails filter(OWNER == "MBTA") %>%
group_by(OWNER) %>%
summarize(
num_segments = n(),
total_length = sum(SHAPE_LEN)
%>%
) arrange(desc(total_length))
The bike trails are in a Lambert conformal conic projection. Note that the units of the coordinates are very different from lat/long.
st_bbox(bike_trails)
In order to get these data onto our leaflet
map, we need to re-project them. The leaflet
data are projected in EPSG:4326 (Web Mercator)—and we can’t change that. So we’ll have to convert the bike trails to EPSG:4326. This is done with the st_transform()
function.
<- bike_trails %>%
bike_trails_4326 st_transform(4326)
bike_trails_4326
Note that the projection string and the coordinates are very different now.
Now we can put this on our map.
library(leaflet)
leaflet() %>%
addTiles() %>%
addPolylines(data = bike_trails_4326)
Please consult the Leaflet documentation for more details and examples.
# SAMPLE SOLUTION
<- colorNumeric(
pal palette = "viridis",
domain = bike_trails_4326$SHAPE_LEN
)
leaflet(data = bike_trails_4326) %>%
addTiles() %>%
addPolylines(color = ~pal(SHAPE_LEN)) %>%
addLegend("bottomright", pal = pal, values = ~SHAPE_LEN)
Please respond to the following prompt on Slack in the #mod-spatial
channel.
Prompt: What questions to you still have about working with shapefiles in
sf
?