eroicaleo.github.io

Lecture 1

02 Principles of Analytic Graphics

  1. Always show comparison, compare evidence between 2 different hypothesis.
  2. Show causality, mechanism, explanation.
  3. Show multivariate data, some variables are confounding variables.
  4. Integration evidence, different modes of evidence.
  5. Describe and document the appropriate labels.
  6. Content is King.

03 Exploratory Graphs

Why we need graphs?

  1. We want to understand the data
  2. We want to find patterns
  3. To suggest modeling strategies
  4. To debug analysis

Characteristics of exploratory graphs

  1. made quickly
  2. made in large numbers
  3. goal is for personal understandings

Simple Summary of data

  1. Five number of summary

     summary(pollution$pm25)
    
  2. boxplot

     boxplot(pollution$pm25, col = "blue")
     # We can also overlay features
     # This will draw a line y = 12
     abline(h = 12)
    
  3. histogram

     hist(pollution$pm25, col = "green")
     hist(pollution$pm25, col = "green", breaks = 100)
     # density plot, add a strip under the histogram indicating location of each data point
     rug(pollution$pm25)
     # This will draw a line x = 12, with width 2.
     abline(v = 12, lwd = 2)
     # This will draw a line cross the median data point.
     abline(v = median(pollution$pm25), col = "magenta", lwd = 4)
    
  4. barplot: it is for categorical data.

     barplot(table(pollution$region), col = "wheat", main = "Number of Counties in Each Region")
    
  5. boxplot for 2 dimensional data

     boxplot(pm25 ~ region, data = pollution, col = "red")
    
  6. histogram for 2 dimensional data

     par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))
     hist(subset(pollution, region == "east")$pm25, col = "green")
     hist(subset(pollution, region == "west")$pm25, col = "green")
    

    mar is for margin, see the man page of par.

  7. scatterplot

     with(pollution, plot(latitude, pm25))
     with(pollution, plot(latitude, pm25, col = region))
    

From the above examples, we can see there is two ways to refer to the data in a dataframe.

  1. plot(y ~ x, data = pollution) The ~ is called formula notation.
  2. with(pollution, plot(y, x))

Further resource

R graph gallery

R Bloggers

04 Plotting system in R

  1. base plot system “Artist’s palette” model, add pieces one by one. generation, plot or like function. Then annotation, add labels, axis etc.

  2. Lattice plot system plots are create at once. Good for condition plot, panel plot. Good for putting many many plots on a screen.

  3. ggplot system It mixes the ideas of both

05 Base Plotting system

The Process of Making a plot

Some questions to think about?

  1. The plot will be on a paper? Screen?
  2. Will the plot be used in screen? web browser? presentation?
  3. Large data go into the plot?
  4. Need to dynamically resize the plot?
  5. Which plotting system we will use?

Base Graphics

There are 2 steps:

  1. Initializing a plot
  2. Annotating the plot

plot and hist will launch a graphics device, if there is no one open. plot has lots of arguments, letting you set title, labels. Most of them are documented in the par function man pages.

3 base graph commands: plot, hist, boxplot.

Important parameters:

Use par() function to specify these parameters and also to read the current value of them, like par("col"). Don’t forget the double quote.

Base plotting functions

Summary

Very flexible and offers high degree of control, but maybe tedious.

06 Base Plotting Demonstration

We can use the examples function to see the examples of a function like examples(points)

pch 21 ~ 25 are similar to 1 ~ 6, but they have boundaries (with col parameters) and fills (with bg parameters).

We could make the plot, but don’t put the data in it by doing plot(x, y, type = "n")

g <- gl(2, 50, labels = c("Male", "Female"))
plot(x, y, type = "n")
plot(x[g == "Male"], y[g == "Male"], col = "blue")
plot(x[g == "Female"], y[g == "Female"], col = "green")

07 Graphic devices

What is a graphic device

How does a plot get created?

pdf(file = "myplot.pdf")

Graphic file devices

Vector formats:

Bitmap formats:

Multiple open graphic devices

Copy plots

dev.copy(png, file = "myfile.png")
dev.off()
# if want PDF
dev.copy2pdf

Warning: the plot may not be exactly the same as seen in screen

Lecture 02

Lattice plotting system

Introduction

Important functions

xyplot function

xyplot(y ~ x | f * g, data)

Simple lattice plot

library(lattice)
library(datasets)
xyplot(Ozone ~ Wind, data = airquality)
library(lattice)
library(datasets)
xyplot(Ozone ~ Wind, data = airquality)
airquality <- transform(airquality, Month = factor(Month))
xyplot(Ozone ~ Wind | Month, data = airquality, layout = c(5, 1))

Lattice behaviour

Fundamental difference between base plot system

Lattice panel functions

# Panel functions
xyplot(y ~ x | f, panel = function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.abline(h = median(y), lty = 2)
})

xyplot(y ~ x | f, panel = function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.lmline(x, y, col = 2)
})

Summary

ggplot2

What is ggplot2?

Grammar of Graphics

The basics qplot

ggplot2 part2

# installation
install.packages("ggplot2")

Hello word for ggplot2

library(ggplot2)
str(mpg)
qplot(displ, hwy, data = mpg)

Aesthetic

We map the drv variable to different colors, and the plot is automatically labeled.

qplot(displ, hwy, data = mpg, color = drv)

Adding geoms

We can add a smooth line here, note that we want 2 geometric objects here, the data points themselves and the a smooth line.

qplot(displ, hwy, data = mpg, geom = c("point", "smooth"))

Histogram

Make histogram by just specify single variable. Note that here, we need to use fill argument to specify colors.

qplot(hwy, data = mpg, fill = drv)

Facets

qplot(displ, hwy, data = mpg, facets = . ~ drv)
qplot(hwy, data = mpg, facets = drv ~ ., binwidth = 2)

Density smooth

qplot(log(eno), data = maacs, geom = "density", color = mopos)

Scatterplot

# separate by shape
qplot(log(eno), log(pm25), data = maacs, shape = mopos)
# separate by color
qplot(log(eno), log(pm25), data = maacs, color = mopos)
# Adding linear regression model smooth line
qplot(log(eno), log(pm25), data = maacs, color = mopos, geom = c("point", "smooth"), method = "lm")
# separate by facets argument
qplot(log(eno), log(pm25), data = maacs, facets = . ~ mopos, geom = c("point", "smooth"), method = "lm")

Summary of qplot

ggplot part3

Basic components

Building Plots with ggplot2

qplot(logpm25, NocturnalSympt, data = maacs, facets = . ~ bmicat, geom = c("point", "smooth"), method = "lm")

# Initial call to ggplot, specify dataframe, x, y
g <- ggplot(maacs, aes(logpm25, NocturnalSympt))
# Add objects to plot using +
p <- g + geom_point()
print(p)

# Can add smooth line
p <- g + geom_point() + geom_smooth()
p <- g + geom_point() + geom_smooth(method = "lm")
# Then add facets
# The labels are from the variable
# It's better to make sure to label data properly
p <- p + facet_grid(. ~ bmicat)

Annotation

geom_point(color = "steelblue", alpha = 1/2, size = 4)
# Note that if I want to assign color to different data, I have to wrap it in
# aes() function, thus subsetting it with different colors based on factor variable values
geom_point(aes(color = bmicat), alpha = 1/2, size = 4)
# Add labels and title
+labs(title = "MAACS Cohort")
+labs(x = expression("log " * PM[2.5]), y = "Nocturnal Symptoms")
# Modify smooth line, se turns off confidence interval
+ geom_smooth(size = 4, linetype = 3, method = "lm", se = FALSE)
# Change the background and font
+ theme_bw(base_family = "Times")

ggplot2 part 5

A note about axis limit

Sometimes we may not want to look at the outlier and only focus the typical data

# if we do this, ggplot will subset the data within the range, outlier is excluded
g <- ggplot(testdat, aes(x, y))
g + geom_line() + ylim(-3, 3)
# We might want do
g + geom_line() + coord_cartesian(ylim(-3, 3))

More complex example

We want to see the NO2 and BMI, but NO2 is continous variables. We could use cut() function to make it categorical variable.

Making NO2 Tertile

# Calculate the deciles of the data
cutpoints <- quantile(maacs$logno2_new, seq(0, 1, length = 4), na.rm = TRUE)
# Cut the data at the deciles and create new
maacs$no2dec <- cut(maacs$logno2_new, cutpoints)
# See the levels of new factor variable
levels(maacs$no2dec)

# The real plotting
g <- ggplot(maacs, aes(logpm25, NocturnalSympt))
g + geom_point(alpha = 1/3)
  + facet_wrap(bmicat ~ no2dec, nrow = 3, ncol = 4)
  + geom_smooth(method = "lm", col = "steelblue", se = FALSE)
  + theme_bw(base_family = "Avenir", base_size = 10)
  + labs(x = expression("log " * PM[2.5]))
  + labs(y = "Nocturnal Symptoms")
  + lebs(title = "MAACS Cohort")

Summary of ggplot