The sf
package
The sf
(simple features) package can read and save shapefiles, and it plays nicely with the dpylr functions you’ve been learning.
install.packages("sf")
To load a shapefile, use the function st_read()
. The st stands for spatial and temporal data.
DOWNLOAD — Census shapefile
- Download the ZIP file above for an example shapefile.
- Move the file to your project’s data folder.
- Unzip the file.
Read the data using st_read("shapefile file location")
library(dplyr)
library(sf)
<- st_read("data/Census2010_med_household_income.shp", stringsAsFactors = F) census
When you view the data you will see all the attributes of the shapefile stored in the first columns, and the spatial information stored in a final column named geometry
.
# View your new spatial data table
glimpse(census)
## Rows: 436
## Columns: 18
## $ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ NAME10 <chr> "120.01", "121.01", "121.02", "223.01", "261.01", "264.03",…
## $ INTPTLAT10 <chr> "+44.8970126", "+44.8953840", "+44.8980613", "+44.9439811",…
## $ INTPTLON10 <chr> "-093.3054316", "-093.2360565", "-093.2152830", "-093.39376…
## $ GEOID101 <dbl> 27053012001, 27053012101, 27053012102, 27053022301, 2705302…
## $ GEOID <chr> "1400000US27053012001", "1400000US27053012101", "1400000US2…
## $ GEOID2 <dbl> 27053012001, 27053012101, 27053012102, 27053022301, 2705302…
## $ AssocDegre <dbl> 52.2, 27.7, 28.7, 54.1, 50.9, 32.7, 48.7, 29.5, 58.8, 30.4,…
## $ Bachelor_A <dbl> 8.3, 14.6, 8.5, 30.3, 9.7, 21.8, 24.6, 0.0, 15.3, 1.3, 6.0,…
## $ Total_HHol <dbl> 2631, 1359, 1306, 1048, 1273, 1400, 1183, 1100, 1172, 913, …
## $ Recvd_Fsta <dbl> 37, 254, 106, 15, 18, 74, 0, 4, 35, 175, 256, 131, 203, 174…
## $ Total_HH_1 <dbl> 2631, 1359, 1306, 1048, 1273, 1400, 1183, 1100, 1172, 913, …
## $ HHold_MedI <dbl> 78060, 31612, 54183, 71250, 77145, 66341, 108150, 101210, 9…
## $ NoHigherEd <dbl> 21.0, 65.7, 32.1, 19.4, 21.9, 18.0, 21.6, 43.3, 34.2, 45.1,…
## $ TotalPopCo <dbl> 5922, 3405, 2648, 2113, 3206, 2776, 3219, 2824, 2982, 2433,…
## $ NoCollegeD <dbl> 0.3546099, 1.9295154, 1.2122356, 0.9181259, 0.6830942, 0.64…
## $ SNAP_Recvd <dbl> 0.6247889, 7.4596182, 4.0030211, 0.7098912, 0.5614473, 2.66…
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((-10387144 5..., MULTIPOLYGON (…
Try plot(census)
to see an overview of all the shapefile’s attributes.
Quick plots
Plot all variables
# Colors each polygon by attribute
%>% plot(max.plot = 12) census
Plot a single variable
# Use select() to pick a single attribute
%>% select(HHold_MedI) %>% plot() census
Add grid lines
# Add graticules
%>% select(HHold_MedI) %>% plot(graticule = T, axes = T) census
Plot with ggplot2 and viridis colors.
Run the code below to get the wonderful viridis color palette.
install.packages("viridis")
Now you’re ready to make some ggplot maps.
# Load updated ggplot2 and colors
library(ggplot2)
library(viridis)
ggplot(census) +
geom_sf(aes(fill = HHold_MedI)) +
scale_fill_viridis() +
labs(title = "Median Household Income by Census tract")
Transform coordinates
Current coordinate reference system (CRS)
st_crs(census)
## Coordinate Reference System:
## User input: WGS 84 / Pseudo-Mercator
## wkt:
## PROJCRS["WGS 84 / Pseudo-Mercator",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["Popular Visualisation Pseudo-Mercator",
## METHOD["Popular Visualisation Pseudo Mercator",
## ID["EPSG",1024]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["False easting",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["World - 85°S to 85°N"],
## BBOX[-85.06,-180,85.06,180]],
## ID["EPSG",3857]]
EPSG number for lat/long web coordinates - 4326
Web maps like Google and OpenStreets use the spherical Mercator projected coordinate system: EPSG#: 3857
.
(See https://epsg.io/3857)
However, the coordinates shown to the user are primarily in : EPSG#: 4326
.
(See https://epsg.io/4326)
To make Anne Morris’ day, ask her a question about coordinate reference systems.
EPSG number for UTM coordinates: 26915
Minnesota tends to use UTM coordinates for official business. They look like really big numbers, kind of like this: 475340, 5010480
. They’re easy to put in the wrong order, so it helps to remember that the latitude (or Y-direction or N-S direction) is the one with an extra digit.
UTM coordinates are broken up into different zones to help reduce the errors of mapping a sphere to a flat surface. Minnesota uses UTM zone 15N. If you’re serious about maps, you should go ahead and get this tatooed to your right shoulder.
The EPSG code for UTM zone 15N is 26915
.
(See https://epsg.io/26915)
Warning: When you save UTM coordinates in Excel or R, it’s best to switch the columns to text. It will save you from finding all of your coordinates being rounded by an overly helpful computer.
Create a new map in UTM coordinates
# Transform coordinates
<- census %>% st_transform(26915)
census_utm
# Quick plot
%>% select(HHold_MedI) %>% plot() census_utm
Hmmm… that’s a subtle difference.
Centroids
# Calculate Census tract centroids
<- census_utm %>% st_centroid()
tract_centers
# Quick plot
%>% plot() tract_centers
Add UTM centroid coordinates as new data columns
# Get centroid coordinates
<- st_coordinates(tract_centers) %>% as_tibble()
center_coords
# Add centroid coordinates as columns to polygon data
<- census_utm %>%
census_utm mutate(utm_x = center_coords$X,
utm_y = center_coords$Y)
# Check if it worked
names(census_utm)
## [1] "OBJECTID" "NAME10" "INTPTLAT10" "INTPTLON10" "GEOID101"
## [6] "GEOID" "GEOID2" "AssocDegre" "Bachelor_A" "Total_HHol"
## [11] "Recvd_Fsta" "Total_HH_1" "HHold_MedI" "NoHigherEd" "TotalPopCo"
## [16] "NoCollegeD" "SNAP_Recvd" "geometry" "utm_x" "utm_y"
glimpse(census_utm)
## Rows: 436
## Columns: 20
## $ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ NAME10 <chr> "120.01", "121.01", "121.02", "223.01", "261.01", "264.03",…
## $ INTPTLAT10 <chr> "+44.8970126", "+44.8953840", "+44.8980613", "+44.9439811",…
## $ INTPTLON10 <chr> "-093.3054316", "-093.2360565", "-093.2152830", "-093.39376…
## $ GEOID101 <dbl> 27053012001, 27053012101, 27053012102, 27053022301, 2705302…
## $ GEOID <chr> "1400000US27053012001", "1400000US27053012101", "1400000US2…
## $ GEOID2 <dbl> 27053012001, 27053012101, 27053012102, 27053022301, 2705302…
## $ AssocDegre <dbl> 52.2, 27.7, 28.7, 54.1, 50.9, 32.7, 48.7, 29.5, 58.8, 30.4,…
## $ Bachelor_A <dbl> 8.3, 14.6, 8.5, 30.3, 9.7, 21.8, 24.6, 0.0, 15.3, 1.3, 6.0,…
## $ Total_HHol <dbl> 2631, 1359, 1306, 1048, 1273, 1400, 1183, 1100, 1172, 913, …
## $ Recvd_Fsta <dbl> 37, 254, 106, 15, 18, 74, 0, 4, 35, 175, 256, 131, 203, 174…
## $ Total_HH_1 <dbl> 2631, 1359, 1306, 1048, 1273, 1400, 1183, 1100, 1172, 913, …
## $ HHold_MedI <dbl> 78060, 31612, 54183, 71250, 77145, 66341, 108150, 101210, 9…
## $ NoHigherEd <dbl> 21.0, 65.7, 32.1, 19.4, 21.9, 18.0, 21.6, 43.3, 34.2, 45.1,…
## $ TotalPopCo <dbl> 5922, 3405, 2648, 2113, 3206, 2776, 3219, 2824, 2982, 2433,…
## $ NoCollegeD <dbl> 0.3546099, 1.9295154, 1.2122356, 0.9181259, 0.6830942, 0.64…
## $ SNAP_Recvd <dbl> 0.6247889, 7.4596182, 4.0030211, 0.7098912, 0.5614473, 2.66…
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((475580.9 49..., MULTIPOLYGON (…
## $ utm_x <dbl> 475925.2, 481549.1, 483002.3, 468882.9, 465465.5, 465619.8,…
## $ utm_y <dbl> 4971482, 4971497, 4971649, 4976925, 4975884, 4980256, 49744…
Area calculations with st_area()
# Add area column for each Census tracts
<- census_utm %>% mutate(area = st_area(geometry))
census_utm
# Quick plot of BIGGEST 3 tracts
%>%
census_utm filter(rank(area) > 433) %>%
select(HHold_MedI) %>%
plot()
# Filter to top 3
<- filter(census_utm, rank(area) > 433)
big_tracts
# Convert to a plain data-frame
<- st_set_geometry(big_tracts, NULL)
big_tracts
# Put big points to the 3 largest Census tracts
ggplot(census_utm) +
geom_sf(aes(fill = HHold_MedI)) +
geom_point(data = big_tracts, aes(x = utm_x, y = utm_y), size = 7) +
scale_fill_viridis() +
labs(title = "Median Household Income by Census tract")
Interactive map with leaflet
Install the leaflet package to make quick interactive maps.
install.packages("leaflet")
A map you can zoom and click on!
library(leaflet)
# Leaflet uses coordinates in the normal lat/long Google map flavor.
# EPSG
<- st_transform(census, 4326)
census
# Check coordinate system
# You want "+datum=WGS84"
st_crs(census)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["unknown"],
## AREA["World"],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
# Create a color legend based on HHold_MedI
<- colorQuantile("Blues", census$HHold_MedI, n = 7)
legend_colors
# Polygon map colored by HHold_MedI
<- leaflet(census) %>%
web_map addProviderTiles(providers$OpenStreetMap) %>%
addPolygons(fillColor = ~legend_colors(HHold_MedI),
fillOpacity = 0.75,
color = "white", # Border color
weight = 1, # Border width
popup = paste("Census Tract: ", census$GEOID,
"<br> Median HH Income: ", census$HHold_MedI))
# Show map
web_map
# Click on it.
Let’s make the background gray, and add special marker and a legend.
# Add gray base map and a marker for your favorite Tract
<- web_map %>%
web_map addProviderTiles(providers$CartoDB.Positron, options = providerTileOptions(opacity = 0.5)) %>%
addMarkers(lng = census$INTPTLON10[1] %>% as.numeric(),
lat = census$INTPTLAT10[1] %>% as.numeric(),
label = "You are not here.",
labelOptions = labelOptions(noHide = T),
popup = paste("These people make more $ than you."))
# Add a floating legend
<- web_map %>%
web_map addLegend("topleft",
pal = legend_colors,
values = ~HHold_MedI,
title = "Median HH Income %ile")
# Reprint map
web_map
Distance calculations with st_distance()
# Find distance between the center of first 3 Census tracts
<- census_utm[1:3, ] %>% st_distance()
distances
1:3, ] %>% select("HHold_MedI") %>% plot(pch = 19) census_utm[
# Using the centroids we calculated earlier,
## Find distance between the center of first census tract and all other census tracts
<- tract_centers %>% st_distance()
distances
# Add all distances to the polygon dataframe
<- census_utm %>% mutate(distance = distances[1, ])
census_utm
# Take a look
## Apparently it is a total distance of 0 meters away from itself.
glimpse(census_utm)
## Rows: 436
## Columns: 22
## $ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ NAME10 <chr> "120.01", "121.01", "121.02", "223.01", "261.01", "264.03",…
## $ INTPTLAT10 <chr> "+44.8970126", "+44.8953840", "+44.8980613", "+44.9439811",…
## $ INTPTLON10 <chr> "-093.3054316", "-093.2360565", "-093.2152830", "-093.39376…
## $ GEOID101 <dbl> 27053012001, 27053012101, 27053012102, 27053022301, 2705302…
## $ GEOID <chr> "1400000US27053012001", "1400000US27053012101", "1400000US2…
## $ GEOID2 <dbl> 27053012001, 27053012101, 27053012102, 27053022301, 2705302…
## $ AssocDegre <dbl> 52.2, 27.7, 28.7, 54.1, 50.9, 32.7, 48.7, 29.5, 58.8, 30.4,…
## $ Bachelor_A <dbl> 8.3, 14.6, 8.5, 30.3, 9.7, 21.8, 24.6, 0.0, 15.3, 1.3, 6.0,…
## $ Total_HHol <dbl> 2631, 1359, 1306, 1048, 1273, 1400, 1183, 1100, 1172, 913, …
## $ Recvd_Fsta <dbl> 37, 254, 106, 15, 18, 74, 0, 4, 35, 175, 256, 131, 203, 174…
## $ Total_HH_1 <dbl> 2631, 1359, 1306, 1048, 1273, 1400, 1183, 1100, 1172, 913, …
## $ HHold_MedI <dbl> 78060, 31612, 54183, 71250, 77145, 66341, 108150, 101210, 9…
## $ NoHigherEd <dbl> 21.0, 65.7, 32.1, 19.4, 21.9, 18.0, 21.6, 43.3, 34.2, 45.1,…
## $ TotalPopCo <dbl> 5922, 3405, 2648, 2113, 3206, 2776, 3219, 2824, 2982, 2433,…
## $ NoCollegeD <dbl> 0.3546099, 1.9295154, 1.2122356, 0.9181259, 0.6830942, 0.64…
## $ SNAP_Recvd <dbl> 0.6247889, 7.4596182, 4.0030211, 0.7098912, 0.5614473, 2.66…
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((475580.9 49..., MULTIPOLYGON (…
## $ utm_x <dbl> 475925.2, 481549.1, 483002.3, 468882.9, 465465.5, 465619.8,…
## $ utm_y <dbl> 4971482, 4971497, 4971649, 4976925, 4975884, 4980256, 49744…
## $ area [m^2] 2625248 [m^2], 2388985 [m^2], 1964491 [m^2], 1521472 [m^2],…
## $ distance [m] 0.000 [m], 5623.902 [m], 7079.086 [m], 8900.067 [m], 11348.12…
Download Census data
The tigris package is great for Census boundary files, and the tidycensus provides access to all of the updated ACS demographic data. With their powers combined you can make all the maps.
Spatial polygons
Let’s install the tigris package.
install.packages(tigris)
Minnesota tracts
library(tigris)
# Load geography
<- tracts(state = "MN", cb = TRUE)
mn_tracts
# Plot map
%>% select(TRACTCE) %>% plot() mn_tracts
Midwest Counties
# Load geography
<- counties(state = c("MN","SD","ND", "WI"), cb = TRUE)
midw_counties
%>% select(STATEFP) %>% plot() midw_counties
Tribal areas
# Load geography
<- native_areas(cb = TRUE) tribal_bgs
Interactive leaflet map
leaflet(tribal_bgs) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(fillColor = "steelblue",
color = "black",
weight = 0.5,
opacity = 0.75,
popup = ~NAMELSAD)
American Community Survey (ACS)
Let’s install the tidycensus package.
install.packages(tidycensus)
Download population data for 2015 (5 year average)
library(tidycensus)
# Load Census key for downloading Census data
# You can receive your own at http://api.census.gov/data/key_signup.html
census_api_key(readLines("https://itep-r.netlify.com/data/Census_shapefile/census_key.txt"))
# View all 2015 ACS variables you can choose from
<- load_variables(2015, "acs5", cache = TRUE)
acs_variables
# Get the population data fot all MN Census Tracts
<- get_acs(geography = "tract",
pops state = "MN",
variables = "B01003_001E",
survey = "acs5",
year = 2015)
# Show top 6 rows
select(pops, -GEOID) %>% head()
## # A tibble: 6 × 4
## NAME variable estimate moe
## <chr> <chr> <dbl> <dbl>
## 1 Census Tract 7701, Aitkin County, Minnesota B01003_001 2230 123
## 2 Census Tract 7702, Aitkin County, Minnesota B01003_001 2065 157
## 3 Census Tract 7703, Aitkin County, Minnesota B01003_001 3470 201
## 4 Census Tract 7704, Aitkin County, Minnesota B01003_001 3090 168
## 5 Census Tract 7905.01, Aitkin County, Minnesota B01003_001 1865 108
## 6 Census Tract 7905.02, Aitkin County, Minnesota B01003_001 3119 149
Reference: UseR 2017, Reading and writing spatial data