Fancy Rugs in Regression Plots


Additive Regression Model with Customized Rug
Additive Regression Model with Customized Rug

Rugplots along the axes show the distribution of the underlying data in regression model plots. This is particulary useful in connection with additive (nonparametric) models where the plotted smooth function is the exclusive representation of the model in order to assess how much data contributed to the model fit at the different values of the exlanatory variable.

The custom plot.gam() function includes the possibility of such rugs and pointwise conficence intervalls by default.

Adding quartiles to the rugs requires some customization, though. I included the complete code to produce the above plot underneath.

The example for this GAM is borrowed from the excellent book of Alain Zuur et. al. Mixed Effects Models and Extensions in Ecology with R p.55ff. The ISIT data to run the code above is included in the R package AED which can be downloaded from the books website.

Note: the package AED is needed for the example dataset only. It is NOT necessary to use the example code on ones own dataset.

The main points concerning the rugs and quantile lables on the x-axis are:

  1. Plot the coordinate system without lables
  2. plot( ... , axes = FALSE )

  3. Plot the x-axis with x-lables
  4. axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)

  5. Plot rugs – the jitter() is necessary since a lot of datapoints sit on the same values of SampleDepth – so shake them a bit.
  6. axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)

  7. Print the lables “1Q”, “Median” and “3Q” or whatever you like to call them on the right position. The line and padj parameter set the position of the text and cex.axis the textsize.
  8. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)

  9. Plot thick tickmarks crossing through the rug cloud at 1Q, median and 3Q
  10. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

  11. and finally short thick tickmarks under the text touching the x-axis
  12. axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

Here goes the complete code:
library(mgcv)
library(AED)
data(ISIT)
#
# Fit a univariate GAM model
model <- gam(Sources ~ s(SampleDepth) , data = ISIT)
fit <- predict(model, se = T)$fit
se <- predict(model, se = T)$se.fit
lcl <- fit - 1.96 * se
ucl <- fit + 1.96 * se
#
# open a jpeg
jpeg("FancyRugs.jpg" , width=400, height=400)
#
# set plotting options: 1 plot per page, horizontal labels and textsize
par(mfrow = c(1,1) , las = 1 , cex = 1)
#
# plot coordinatesystem and labels
plot(0 , bty = "n" , type = "n" , xlim = c(0,5000) , ylim = c(-10,50) , xlab = "Depth (m)" , ylab = expression(paste("Number of sources (" , m^-3 , ")")) , axes = FALSE)
#
title(main="Association between number of sources of\nbioluminescent organisms and ocean depth" , cex.main = 0.8)
#
## _____ X-AXIS ______
# x-axis values
axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)
#
# rugs at datapoints
axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)
#
# labels at 1Q, median and 3Q
axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)
#
# tick marks at 1Q, median and 3Q
axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)
#
axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)
#
## _____ Y-AXIS ______
# y-axis values
axis(side = 2 , at = 0:5*10)
#
# rugs at datapoints
axis(side = 2 , line = -0.9 , at = jitter(ISIT$Sources) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)
#
# labels at 1Q, median and 3Q
axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black")
#
# thicker tick marks at 1Q, median and 3Q
axis(side = 2 , line = 0.3, at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 0.3 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE , padj = -2)
axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black" , labels = FALSE)
#
# horizontal line marking the intercept = mean(Sources) (for univariate model only)
abline(h=mean(ISIT$Sources), lty=3)
#
# Scatterplot
lines(ISIT$SampleDepth , ISIT$Source , type = "p" , cex = 0.4 , lwd = 0.2 , col = "grey")
#
# plot main figure
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , fit[order(ISIT$SampleDepth)] , col = "black" , lwd = 2)
#
# plot lower confidence limit (lcl)
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , lcl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1)
#
# plot upper confidence limit (ucl)
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , ucl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1)
#
# closing the jpg file
dev.off()

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s