Day 3 - AM


1 Day 1 review


Get to know your DATA

Function Information
names(scrap) column names
nrow(...) number of rows
ncol(...) number of columns
summary(...) summary of all column values (ex. max, mean, median)
glimpse(...) column names + a glimpse of first values (requires dplyr package)

Filtering

Menu of comparisons

Symbol Comparison
> greater than
>= greater than or equal to
< less than
<= less than or equal to
== equal to
!= NOT equal to
%in% value is in list: X %in% c(1,3,7)
is.na(...) is the value missing?
str_detect(col_name, "word") “word” appears in text?

Your analysis toolbox


dplyr is the hero for most analysis tasks. With these six functions you can accomplish just about anything you want with your data.


Function Job
select() Select individual columns to drop or keep
arrange() Sort a table top-to-bottom based on the values of a column
filter() Keep only a subset of rows depending on the values of a column
mutate() Add new columns or update existing columns
summarize() Calculate a single summary for an entire table
group_by() Sort data into groups based on the values of a column

dplyr with Porgs

The poggle of porgs are back to help us review dplyr functions.

library(tidyverse)

porgs <- read_csv("https://mn-r.netlify.com/data/porg_data.csv")


























2 Day 2 review


  • Join tables with left_join()
  • summarize() functions
  • Group by category with group_by()
  • ifelse(): if THIS IS TRUE do a thing, otherwise do a different thing
  • Plots and charts with ggplot
library(tidyverse)

porgs <- read_csv("https://mn-r.netlify.com/data/porg_data.csv")

porg_names <- read_csv("https://mn-r.netlify.com/data/porg_names.csv")


The joined table

named_porgs <- left_join(porgs, porg_names, by = "id")


Welcome to Endor!


But wait…


The Ewoks say it’s unsafe to land. Not good. It sounds like the Empire is checking all incoming ships for licenses.

Lucky for us, there is some good news. The Ewoks say for a small ship like ours, it could be possible to land undetected. But we’ll have to put down right-dab in the middle of the 3 northern outposts.


Are you up for a challenge data droid?


Either way, let’s make a map to see what we’re up against. To help, our comrades sent along the coordinates of the Empire’s outposts. Guess what the first step is? That’s right. Time to add a new package.

3 Reading coordinates


To read coordinates stored in a CSV or Excel spreadsheet, we can read them into R with our usual friends:

  • readr’s read_csv() for CSV files
  • readxl’s read_excel() for Excel files

For shapefiles, we use the sf package:

  • sf’s st_read() for SHP files

Get the coordinates from CSV

library(readr)

outposts <- read_csv("https://mn-r.netlify.app/data/endor_outposts.csv")

Take a look inside

outposts
## # A tibble: 3 × 3
##   ID      lat  long
##   <chr> <dbl> <dbl>
## 1 1NO    36.7 -119.
## 2 1NL    36.6 -119.
## 3 3NB    36.6 -119.
glimpse(outposts)
## Rows: 3
## Columns: 3
## $ ID   <chr> "1NO", "1NL", "3NB"
## $ lat  <dbl> 36.65520, 36.58375, 36.59790
## $ long <dbl> -118.8469, -118.8195, -118.9022


To make a quick map, use ggplot and add + geom_point().

library(ggplot2)

ggplot(outposts, aes(x = long, y = lat)) + 
  geom_point(size = 8, color = "steelblue") 


Where’s the center point?


Hint: How would you find the halfway point between two points?


Let’s use mutate() from the dplyr toolbox to update our location columns.

  • Set the new lat column to the center of all the lat’s
  • Set the new long column to the center of all the long’s.

These will be the coordinates for our landing pad.

mutate the center

Complete the code

# Update lat/long to the center point (the average of ALL the lat/long)             
land_pad <- summarize(outposts, 
                      lat  = mean( ______ ),
                      long = _____________ ,
                      ID   = "Land here!")
Show code
land_pad <- summarize(outposts, 
                      lat  = mean( lat ),
                      long = mean( long ),
                      ID   = "Land here!")

Where should we land?

# View center coordinates
land_pad
lat long ID
36.61228 -118.8562 Land here!


We found it!

Let’s add it to our map.

library(ggplot2)

ggplot(outposts, aes(x = long, y = lat)) + 
  geom_point(size = 8, color = "steelblue") +
  geom_point(data = land_pad, 
             aes(x = long, y = lat), 
             size = 12,
             color = "green")


That’s looking good, but…

We’re going to need to be very precise to land perfectly in between these outposts. To make our captain really happy, let’s put this all in an interactive zoomable map. For interactive maps we use leaflet.

4 Leaflet maps


The leaflet package.

Leaflet makes interactive maps easy and you build them up in layers similar to a ggplot.


Install leaflet

install.packages("leaflet")

Explore the Full Leaflet Guide!


Maps that zoom

Leaflet builds map with layers similar to ggplot, but instead of adding things with the +, leaflet adds new layers with the %>% (“the pipe”).

1. Start with the outpost coordinates

library(leaflet)

leaflet(outposts) %>% 
  addCircles(radius = 250) %>%  
  addTiles()


2. Add the landing site

Fingers crossed it’s in the middle….

leaflet(outposts) %>% 
  addCircles(radius = 300) %>% 
  addTiles() %>%
  addMarkers(data = land_pad)  #<<


3. Add labels

Try hovering over one of the outposts to see its label.

leaflet(outposts) %>% 
  addCircles(radius = 300,
             label = ~ID) %>%   #<<
  addTiles() %>%
  addMarkers(data  = land_pad,
             label = "Land HERE!",  #<<
             labelOptions = labelOptions(noHide = TRUE))  #<<


Explore!

News just came in that the outposts will soon be upgrading their radar to detect ships up to 4,200 meters away. Will that be a problem?

Steps

  1. Update the radius argument in addCircles(...) to be equal to 4200.
  2. Add the argument , fillColor = "yellow" inside addCircles(..)
Show code
leaflet(outposts) %>%
  addCircles(radius = 4200, #<<
             label = ~ID,
             fillColor = "yellow") %>% 
  addTiles() %>%
  addMarkers(data  = land_pad,
             label = "Land HERE!",  
             labelOptions = labelOptions(noHide = TRUE))


Will our landing spot still be safe?


Woah! That’s a close one.

Great work data droid. Let’s take a minute to cool down…

Ok, now let’s land on that planet before we get stuck up here with permanent space legs.


BASEMAPS

There are lots of options for basemaps in leaflet. One of my favorites is CartoDB.Positron because its greyness doesn’t distract from the data.

Add it to your map by swapping out addTiles() for addProviderTiles(providers$CartoDB.Positron)

Show code
leaflet(outposts) %>%
  addCircles(radius = 4200, 
             label = ~ID,
             fillColor = "yellow") %>% 
   addProviderTiles(providers$CartoDB.Positron) %>%  #<<
   addMarkers(data  = land_pad,
              label = "Land HERE!",  
              labelOptions = labelOptions(noHide = TRUE))



See more available basemaps at rstudio.github.io/leaflet/basemaps.

For more Leaflet examples, see the Full Leaflet Guide!

5 Invasive Porg Survey - 2023


The Ewoks need our help. There’s porgs EVERYWHERE!

Porgs have been spreading across the galaxy; likely by hitching rides on unsuspecting ships. They’re so cute people hate to say it, but they are starting to become a nuisance.

To get a grasp on the population explosion the Ewoks are launching a porg survey. And they need your help.

6 Dates with lubridate


The lubridate package.


It’s about time! Lubridate makes working with dates much easier.

We can find how much time has elapsed, add or subtract days, and find seasonal and day of the week averages. The package is included in the tidyverse bundle of packages, so it’s already installed!


View the date cheatsheet HERE.

It’s a great reference when you need to manipulate dates or timezones in your data.

1. Scheduling weekdays

Ewoks are very busy. They only have one day per week when they can volunteer.

Here is the weekday when volunteers are available at each location:

  • Bright Tree: Thursdays
  • Fern Gully: Fridays


We can use the seq.Date() function and the option to step by = "week" to generate the survey dates for each site. But we need to know which day to start from.

When is the first Thursday in May of 2023?


For that, we can use the wday(), the “weekday” function.

# wday tells you the day of week (Sun, Mon, etc..) for a specific date
wday(ymd('2023-05-01'), label = TRUE, abbr = FALSE)
## [1] Monday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday


So the 1st of May will be a Monday. That means…. May the 4th will be a Thursday. Perfect! That’s my favorite day.

Mutate to the rescue

We really don’t want to check every date one by one do we?

Let’s add a new week_day column to our survey table that checks ALL the dates ALL at once. To add a new column we call our friend mutate().

Complete the code below to add a week_day column.

survey <- mutate(survey, 
                 week_day = wday( ________ , label = TRUE, abbr = FALSE))


filter() week days

With filter we can pick out only the days of the week that we want.

Split the schedule in two by filtering the survey to only the week day needed at each site:

  • Thursday for Bright Tree
  • Friday for Fern Gully


bright_dates <- filter(survey, week_day ==  ________ )

fern_dates   <- filter(survey, week_day ==  ________ )
Show code
bright_dates <- filter(survey, week_day ==  "Thursday" )

fern_dates   <- filter(survey, week_day ==  "Friday" )


How many survey dates are at each site?

Hint: It’s less than 50.

Show answer

26 survey days


2. Particular date formats

Oh no! Each survey site has a very-very particular Assistant to the Regional Manager. And they are demanding a very specific date format for their work schedules.


Before you send off the survey dates, you’ll need to adjust the dates to match the requested formats below.

Preferred date formats

  • Bright Tree: 5-11-2023
  • Fern Gully: May 12, 2023


Use format(count_date, ...) and the date expressions below to format the schedule for each region accordingly.

For example: format(count_date, "%b, %Y") prints the date as Aug, 2023.

  • %b stands for 3-letter month abbreviation
  • %Y% stands for the full 4 digit year

Date parts

Expression Description Example
%Y Year (4 digit) 2023
%y Year (2 digit) 21
%B Month (full name) December
%b Month (abbreviated) Dec
%m Month (decimal number) 12
%d Day of the month (decimal number) 30

Time parts

Expression Description Example
%H Hour 8
%M Minute 13
%S Second 35


Use mutate() to update the week_day column in both site schedules.

Here’s a start

# Set date format to 5-11-2023
bright_dates <- mutate(bright_dates, pretty_date = format(count_date, _______  ))

# Set date format to May 12, 2023
fern_dates <- mutate(fern_dates, pretty_date = format(count_date, _______ ))
Show code
# Set date format to 5-11-2023
bright_dates <- mutate(bright_dates, pretty_date = format(count_date, "%m-%d-%Y"))

# Set date format to May 12, 2023
fern_dates <- mutate(fern_dates, pretty_date = format(count_date, "%b %d, %Y"))

How’d we do?

# Bright Tree schedule
head(bright_dates, 3)
count_date week_day pretty_date
2023-05-04 Thursday 05-04-2023
2023-05-11 Thursday 05-11-2023
2023-05-18 Thursday 05-18-2023
# Fern Gully schedule
head(fern_dates, 3)
count_date week_day pretty_date
2023-05-05 Friday May 05, 2023
2023-05-12 Friday May 12, 2023
2023-05-19 Friday May 19, 2023


Congrats!

Your fine-tuned schedules worked perfectly.

Now let’s jump ahead and take a look at the survey results.

3. Results

Load the porg survey results.

porgs <- read_csv("https://mn-r.netlify.app/data/2023_porg_survey_results.csv")   


Explore a bit.

Are there missing values?


A missing site

It looks like we have a slight missing data problem.

There’s a data point in the results that wasn’t labeled with the site location. We do know the date however.

On 2023-06-30 there were a whopping 7 porgs counted - but we just don’t know where.


Can you determine the site based on the date of the porg count?

mystery_site_date <- "2023-06-30"

Hint: What weekday is this?

Try the wday(date) function.


Good sleuthing data droid.

We’ll learn how to update the site value later today, but right now we’re in a hurry, so let’s remove the row using filter.

Use filter() to keep only the rows in the porgs data where site is NOT NA (missing).

porgs <- filter(porgs, !is.na(site))  


4. The best time for porgs

What is the best month to see porgs?

First, add a month column to the data with the function month() and the column count_date.

porgs <- mutate(porgs, month = month(count_date))


Next, use ggplot() and geom_col() to plot the porg sightings by month.

ggplot(porgs, aes(x = month, y = ______ , fill = site)) +
    geom_col()
Show code
ggplot(porgs, aes(x = month, y = porg_count, fill = site)) +
    geom_col()


Why might June be the lowest month?

Hint: Fern Gully

5. Time series: All the data

Plot all the data with geom_point(). Put count_date on the x-axis, and the porg_count on the y-axis. Set the color to match the site column.

ggplot(porgs, aes(x = _______, y = ______ , color = _____ )) +
    geom_point(size = 5)
Show code
ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
    geom_point(size = 5) 


Oof! That’s a busy plot. Try adding + facet_wrap("site") to the end. What happens?

ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
  geom_point(size = 5) +
  facet_wrap("site")   #<<


Try adding + geom_line().

ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
  geom_point(size = 5) +
  facet_wrap(~ site) +
  geom_line()                #<<
Show code
ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
  geom_point(size = 5) +
  facet_wrap(~ site) +
  geom_line()


Great work

The Ewoks are deeply thankful. They’ll be in touch for Porg Survey 2024.


7 Share with friends


Create a GitHub account


It’s time to share with your R friends. Join GitHub

Add a new repository

Now we can create a new repository to store some of our new R plots and scripts. Click the bright green New button to get started.

  • Give it a short name like Rplots
  • Keep it public
  • Check the box to initialize with a README
  • Click [ Create repository ]

Add an R script or plot

Click Upload files to add an image of an R plot or one of your .R scripts.

Package questions

Google: "package name" + github

For example, here’s the top page for lubridate + github

Secret work projects

  • MDH: Internal secret Gitlab
  • MPCA: Coming soon…


Return to Homebase