R Training | Analysis Outline


You’re ready to blast off on your own. The example outline below includes some placeholder code based on ozone measurements and is presented as a helpful starting point for your analysis. The script snippets will not run successfully as is. They will need to be updated with the name of your own dataframe and its specific column names.

Good luck!

Set up your project


  • Open a new project
  • Open a new R script
  • Create a data folder in your project directory
  • Copy your data into the folder

Begin your analysis


If you’d like, you can try doing your analysis in an Rmarkdown document instead of an R script. Rmarkdown lets you add text and images to your analysis, as well as share your work as a Word document, a website, or even a PDF. Download a version of the analysis outline below can be downloaded here.

DOWNLOAD - Rmarkdown Analysis Outline


1. Read data into R

library(readr)
library(janitor)

# Read a CSV file
air_data <- read_csv("data/My-data.csv")


# Have an EXCEL file?
## You can use read_excel() from the readxl package
install.packages(readxl)

library(readxl)

# Read an EXCEL file
air_data <- read_excel("data/My-data.xlsx")

2. Clean the column names

air_data <- clean_names(air_data)

2. Plot the data

library(ggplot2)

# Remember the ggplot sandwich!
ggplot(air_data, aes(x = TEMP_F, y = OZONE)) + 
    geom_point(aes(color = site_name), alpha = 0.2) +
    geom_smooth(method = "lm")

3. Clean the data

library(dplyr)

# Examples of common issues 

## Drop values out of range
air_data <- air_data %>% filter(OZONE > 0, TEMP_F < 199) 

## Convert all samples to PPB
air_data <- air_data %>% 
            mutate(OZONE = ifelse(UNITS == "PPM", OZONE * 1000, 
                                  OZONE)) 

4. View the data again

Look at the data from different angles (e.g. by category, site, County, or facility).

  • The plotting function facet_wrap() is great for this.
#
# Are some sites different?  
#
# We can facet the data by 'Site' to eliminate any noise 
# caused by mixing data from different sites, and learn 
# if the pattern between ozone and temperature varies.

ggplot(air_data, aes(x = TEMP_F, y = OZONE)) + 
    geom_point(alpha = 0.2, size = 3) +
    geom_smooth(method = "lm") + 
    facet_wrap(~SITE) +
    labs(title    = "Ozone increases with temperature", 
         subtitle = "Observations from 2015-2017")

5. Summarize the data

air_data <- air_data %>% 
            group_by(SITE, YEAR) %>% 
            summarize(AVG_OZONE = mean(OZONE) %>% round(2),
                      AVG_TEMP  = mean(TEMP_F) %>% round(2))

6. Save the results

Save the final data table

write_csv(air_data, "results/2015-17_ozone_summary.csv")


Save the plots

ggsave("results/2015-2017 - Ozone vs Temp.pdf")

7. Share it with the world

E-mail your script to all of your colleagues and create a github account here to share your work with other R enthusiasts.

Congrats!!