Intro to ggplot2

Elliot Shannon

2023-04-13

What is ggplot2

ggplot2 is a versatile and elegant R package for visualizing data. It is a member of the tidyverse family of packages.

What is ggplot2

You can make all sorts of plots with ggplot2

What is ggplot2

You can make all sorts of plots with ggplot2

What is ggplot2

You can make all sorts of plots with ggplot2

Introduction

R for Data Science is a great resource which is freely available online. We will be following the material from Chapter 3.

Introduction

  • ggplot2 implements the grammar of graphics to describe and build figures and graphs
  • This way, we can do more faster by learning one system and applying it in many situations

Introduction

  • Often, when we first work with a new dataset, we use data visualization to better understand the data and look for any potential patterns

  • ggplot2 fits right into our tidyverse workflow, and will be our tool for the job

Motivating Dataset

Recall the FEF_trees.csv dataset.

library(tidyverse)
trees <- read_csv("./data/FEF_trees.csv")
glimpse(trees)
Rows: 88
Columns: 18
$ watershed         <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ year              <dbl> 1991, 1991, 1991, 1991, 1991, 1992, 1992, 1992, 1992…
$ plot              <dbl> 29, 33, 35, 39, 44, 26, 26, 26, 48, 48, 48, 29, 33, …
$ species           <chr> "Acer rubrum", "Acer rubrum", "Acer rubrum", "Acer r…
$ dbh_in            <dbl> 6.0, 6.9, 6.4, 6.5, 7.2, 3.1, 2.0, 4.1, 2.4, 2.7, 3.…
$ height_ft         <dbl> 48.0, 48.0, 48.0, 49.0, 51.0, 40.0, 30.5, 50.0, 28.0…
$ stem_green_kg     <dbl> 92.2, 102.3, 124.4, 91.7, 186.2, 20.8, 5.6, 54.1, 10…
$ top_green_kg      <dbl> 13.1, 23.1, 8.7, 39.0, 8.9, 0.9, 0.9, 8.6, 0.7, 5.0,…
$ smbranch_green_kg <dbl> 30.5, 23.5, 22.3, 22.5, 25.4, 1.9, 2.2, 8.0, 3.7, 3.…
$ lgbranch_green_kg <dbl> 48.4, 57.7, 44.1, 35.5, 65.1, 1.5, 0.6, 4.0, 0.5, 1.…
$ allwoody_green_kg <dbl> 184.2, 206.6, 199.5, 188.7, 285.6, 25.1, 9.3, 74.7, …
$ leaves_green_kg   <dbl> 16.1, 12.9, 16.5, 12.0, 22.4, 0.9, 1.0, 6.1, 2.5, 1.…
$ stem_dry_kg       <dbl> 54.7, 62.3, 73.3, 53.6, 106.4, 11.7, 3.2, 28.3, 5.5,…
$ top_dry_kg        <dbl> 7.1, 12.4, 4.6, 21.3, 4.7, 0.5, 0.5, 4.4, 0.4, 2.7, …
$ smbranch_dry_kg   <dbl> 15.3, 14.8, 11.5, 11.2, 11.7, 1.1, 1.2, 3.6, 1.8, 0.…
$ lgbranch_dry_kg   <dbl> 28.0, 33.6, 25.1, 19.8, 36.1, 0.9, 0.3, 2.1, 0.3, 1.…
$ allwoody_dry_kg   <dbl> 105.1, 123.1, 114.4, 105.9, 159.0, 14.2, 5.3, 38.5, …
$ leaves_dry_kg     <dbl> 6.1, 4.6, 6.1, 4.2, 7.9, 0.3, 0.3, 1.9, 0.8, 0.5, 1.…

First Steps

  • Question: Do taller trees have greater DBH than shorter trees?
  • What does this relationship look like? Is it positive? Negative? Linear? Nonlinear?
  • In our trees tibble, we have two columns containing dbh_in and height_ft

First Steps

# Create a scatterplot of dbh_in and height_ft
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft))

First Steps

  • We begin our plot with the ggplot() function
  • ggplot() creates a coordinate system that we can add layers to
  • ggplot() takes a dataset as a first argument (here, data = trees)
# Create an empty graph
ggplot(data = trees)

First Steps

  • Next, we add one or more layers to ggplot()
  • geom_point() adds a layer of points
  • There are many different geom layers that can be added to a ggplot()
# Add geom_point() layer
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft))

First Steps

  • Each geom function takes a mapping argument
  • This argument defines how variables in data are mapped to visual properties
  • These visual properties are paired with the aes() function
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft))

First Steps

  • We can turn this into a general graphing template
  • We will frequently use this structure
ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

Aesthetic Mappings

  • We may be interested in digging even deeper into this graph
  • Are there any species-specific patterns?

Aesthetic Mappings

  • Recall that trees contains a column called species
  • We can add this third variable to a two dimensional scatterplot by mapping it to an aesthetic
  • Aesthetics include things like size, shape, and color of your points.

Aesthetic Mappings

  • We can use the following code to color each point by species
  • We see that the largest and tallest trees are Prunus serotina
# Color points by species
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft, color = species))

Aesthetic Mappings

  • Here we added the color argument to the aes() function in the mapping for our points
  • We set color = species, where species is a column in our trees tibble
# Color points by species
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft, color = species))

Aesthetic Mappings

  • ggplot uses scaling to automatically assign a unique level of the aesthetic
  • In this case, each species is automatically assigned a unique color
# Color points by species
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft, color = species))

Aesthetic Mappings

  • We could just as easily map species to the size aesthetic instead of color
  • However, this is not advised. Why?
# Size points by species
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft, size = species))

Aesthetic Mappings

  • It would make more sense to map size to something like allwoody_dry_kg
  • What does this plot show?
# Size points by dry woody biomass
ggplot(data = trees) +
  geom_point(mapping = aes(x = dbh_in, y = height_ft, size = allwoody_dry_kg))