Descriptive Statistics in R

Introduction to plotting in the tidyverse: ggplot2

Michael Luu, MPH | Marcio Diniz, PhD
September 22, 2022

ggplot2

ggplot2

ggplot2

ggplot2

ggplot2

ggplot2

ggplot2

ggplot2

Objective

  • We won’t be able to cover how to create all the various types of plots

  • The goal is to provide you with the understanding of the basic components and tools on HOW these plots are created using ggplot2

  • All plots in ggplot2 are built upon the same fundamental principals and concepts

Plots we WILL go over

Dotplot

Histogram

Boxplot

Barplot

Components of a Plot

We will be using the emergency dataset as an example

# A tibble: 149 × 23
      id   age gender    hr   sbp   dbp ascites glasgow   cpr encephal…¹ creat…²
   <dbl> <dbl> <fct>  <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>      <dbl>   <dbl>
 1     1    49 Female   109   114    76       1      15  NA            0    1.17
 2     2    61 Female    78   130    80       0      14  77.5          1    4.4 
 3     3    65 Female    63   138    71       1      14  11.9          1    1.69
 4     4    54 Female   100    92    80       1      15 184            0    0.62
 5     5    66 Female    86   150    70       0      NA  NA            1    0.91
 6     6    54 Female    86    60    40       1      14 200            1    1.09
 7     7    67 Male      NA    NA    NA       0      13  23.7          1    0.58
 8     8    51 Female    68   120    70       1      15  40.1          0    2.35
 9     9    54 Female   112    81    58       1      15 192.           1    2.68
10    10    53 Female    90   103    60       1      15  66.9          0    0.41
# … with 139 more rows, 12 more variables: sodium <dbl>, inr <dbl>,
#   leukocytes <dbl>, neutrophil <dbl>, lymphocytes <dbl>, albumin <dbl>,
#   bilirubin <dbl>, child <dbl>, meld <dbl>, los <dbl>, death <dbl>,
#   infection <dbl>, and abbreviated variable names ¹​encephalopathy,
#   ²​creatinine

Components of a Plot

Components of a Plot

  • The data for the plot comes from the emergency dataset

  • The plot contains a x and y coordinate system

    • A categorical X describing gender (Male, Female)
    • A numeric Y describing the age
  • The plot contains a color

    • The color differs by gender
  • The plot is being depicted using a dotplot (geometries)

The Grammar of Graphics

The Grammar of Graphics

  • The grammar of graphics was originally proposed by Leland Wilkinson

  • A variation of this framework was further proposed by Hadley Wickham which is used in ggplot2

  • This framework is a tool that allows us to concisely describe the components of a plot or graphic

The Grammar of Graphics

The Grammar of Graphics

ggplot(data = df)

The Grammar of Graphics

ggplot(data = df, 
       mapping = aes(x = gender, y = age, fill = gender))

The Grammar of Graphics

ggplot(data = df, 
       mapping = aes(x = gender, y = age, fill = gender)) +
  geom_dotplot(binaxis = 'y', 
               binwidth = 1, 
               stackdir = 'center')

The Grammar of Graphics

ggplot(data = df, 
       mapping = aes(x = gender, y = age, fill = gender)) +
  geom_dotplot(binaxis = 'y', 
               binwidth = 1, 
               stackdir = 'center') +
  labs(x = NULL, 
       y = 'Age')

The Grammar of Graphics

ggplot(data = df,
       mapping = aes(x = gender, y = age, fill = gender)) +
  geom_dotplot(binaxis = 'y',
               binwidth = 1,
               stackdir = 'center') +
  labs(x = NULL, 
       y = 'Age') +
  theme_light(base_size = 20) + 
  theme(legend.position = 'none',
        axis.title = element_text(face = 'bold')) 

Another Example

The Grammar of Graphics

ggplot(data = df)

The Grammar of Graphics

ggplot(data = df, 
       mapping = aes(x = gender, y = age, fill = gender))

The Grammar of Graphics

ggplot(data = df,
       mapping = aes(x = gender, y = age, fill = gender)) +
  geom_boxplot(width = .25)

The Grammar of Graphics

ggplot(data = df,
       mapping = aes(x = gender, y = age, fill = gender)) +
  geom_boxplot(width = .25) +
  labs(x = "Gender", y = "Heart Rate")

The Grammar of Graphics

ggplot(data = df,
       mapping = aes(x = gender, y = age, fill = gender)) +
  geom_boxplot(width = .25) +
  labs(x = "Gender", y = "Heart Rate") +
  theme_light(base_size = 15) +
  theme(legend.position = "none",
        axis.title = element_text(face = "bold"))

Resources

Website

Cheatsheet

Book