Fancy Rugs in Regression Plots

Rugplots along the axes show the distribution of the underlying data in regression model plots. This is particulary useful in connection with additive (nonparametric) models where the plotted smooth function is the exclusive representation of the model in order to assess how much data contributed to the model fit at the different values of the exlanatory variable.

The custom plot.gam() function includes the possibility of such rugs and pointwise conficence intervalls by default.

Adding quartiles to the rugs requires some customization, though. I included the complete code to produce the above plot underneath.

The example for this GAM is borrowed from the excellent book of Alain Zuur et. al. Mixed Effects Models and Extensions in Ecology with R p.55ff. The ISIT data to run the code above is included in the R package AED which can be downloaded from the books website.

Note: the package AED is needed for the example dataset only. It is NOT necessary to use the example code on ones own dataset.

The main points concerning the rugs and quantile lables on the x-axis are:

1. Plot the coordinate system without lables
2. `plot( ... , axes = FALSE )`

3. Plot the x-axis with x-lables
4. `axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)`

5. Plot rugs – the jitter() is necessary since a lot of datapoints sit on the same values of SampleDepth – so shake them a bit.
6. `axis(side = 1 , line = -0.9 , at = jitter(ISIT\$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)`

7. Print the lables “1Q”, “Median” and “3Q” or whatever you like to call them on the right position. The line and padj parameter set the position of the text and cex.axis the textsize.
8. `axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)`

9. Plot thick tickmarks crossing through the rug cloud at 1Q, median and 3Q
10. `axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)`

11. and finally short thick tickmarks under the text touching the x-axis
12. `axis(side = 1 , line = 0.3, at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)`

Here goes the complete code:
```library(mgcv) library(AED) data(ISIT) # # Fit a univariate GAM model model <- gam(Sources ~ s(SampleDepth) , data = ISIT) fit <- predict(model, se = T)\$fit se <- predict(model, se = T)\$se.fit lcl <- fit - 1.96 * se ucl <- fit + 1.96 * se # # open a jpeg jpeg("FancyRugs.jpg" , width=400, height=400) # # set plotting options: 1 plot per page, horizontal labels and textsize par(mfrow = c(1,1) , las = 1 , cex = 1) # # plot coordinatesystem and labels plot(0 , bty = "n" , type = "n" , xlim = c(0,5000) , ylim = c(-10,50) , xlab = "Depth (m)" , ylab = expression(paste("Number of sources (" , m^-3 , ")")) , axes = FALSE) # title(main="Association between number of sources of\nbioluminescent organisms and ocean depth" , cex.main = 0.8) # ## _____ X-AXIS ______ # x-axis values axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE) # # rugs at datapoints axis(side = 1 , line = -0.9 , at = jitter(ISIT\$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) # # labels at 1Q, median and 3Q axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8) # # tick marks at 1Q, median and 3Q axis(side = 1 , line = 0.3, at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) # axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) # ## _____ Y-AXIS ______ # y-axis values axis(side = 2 , at = 0:5*10) # # rugs at datapoints axis(side = 2 , line = -0.9 , at = jitter(ISIT\$Sources) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) # # labels at 1Q, median and 3Q axis(side = 2 , line = -0.7 , at = fivenum(ISIT\$Sources)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black") # # thicker tick marks at 1Q, median and 3Q axis(side = 2 , line = 0.3, at = fivenum(ISIT\$Sources)[2:4], lwd = 0 , tick = T, tcl = 0.3 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE , padj = -2) axis(side = 2 , line = -0.7 , at = fivenum(ISIT\$Sources)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black" , labels = FALSE) # # horizontal line marking the intercept = mean(Sources) (for univariate model only) abline(h=mean(ISIT\$Sources), lty=3) # # Scatterplot lines(ISIT\$SampleDepth , ISIT\$Source , type = "p" , cex = 0.4 , lwd = 0.2 , col = "grey") # # plot main figure lines(ISIT\$SampleDepth[order(ISIT\$SampleDepth)] , fit[order(ISIT\$SampleDepth)] , col = "black" , lwd = 2) # # plot lower confidence limit (lcl) lines(ISIT\$SampleDepth[order(ISIT\$SampleDepth)] , lcl[order(ISIT\$SampleDepth)] , col = "grey" , lwd = 1) # # plot upper confidence limit (ucl) lines(ISIT\$SampleDepth[order(ISIT\$SampleDepth)] , ucl[order(ISIT\$SampleDepth)] , col = "grey" , lwd = 1) # # closing the jpg file dev.off()```

How2plot nicer GAM curves

Generalized additive models visualize potential non-linear associations between a predictor and a response variable.

The default plotting method produces clean plots for all covariates in the model (or a selection) but: They do not have presentation quality by any means in terms of colors, understandable axes-labels, scaling, etc.

This example shows a customization variant for a additive regression model with one covariate. The goal was to display the absolute value of the response variable on the y-axis and not the “difference from intercept” which is default.

This is only meaningful for a single covariate in the model.

1 The default plot.gam() method

```MyGAM1<- with(MyData[MyData\$Strata==1,], gam(Y ~ s(Covariate))) MyGAM0<- with(MyData[MyData\$Strata==0,], gam(Y ~ s(Covariate)))```

```par( mfcol=c(1,2)) plot(MyGAM0) plot(MyGAM1)```

This is the resulting plot:

2 The fancy way

1. Extract the values of the model response from the GAM object:

```response1 <- predict(MyGAM1, type="response", se.fit=T) response0 <- predict(MyGAM0, type="response", se.fit=T)```

2. Print the response values against the covariate (note: this works just with one covariate)

`par(mfcol=c(1,1))`

``` plot(0, type="n", bty="n", main="Fancy GAM plot", xlab="MyCovariate", ylab="MyResponse", lwd=3,ylim=c(0,60), xlim=c(0,200)) legend("bottomright", bty="n", lwd=5, col=c("green","red"), legend=c("Strata = 0", "Strata = 1")) lines(sm.spline(MyGAM1\$model\$Covariate , response1\$fit) , lwd = 3 , col = "red") lines(sm.spline(MyGAM1\$model\$Covariate , response1\$fit+1.96*response1\$se) , lty = 3 , lwd = 2 , col = "red") lines(sm.spline(MyGAM1\$model\$Covariate , response1\$fit-1.96*response1\$se) , lty = 3 , lwd = 2 , col = "red") lines(sm.spline(MyGAM0\$model\$Covariate , response0\$fit) , lwd = 3 , col = "green") lines(sm.spline(MyGAM0\$model\$Covariate, response0\$fit + 1.96 * response0\$se) , lty = 3 , lwd = 2, col = "green") lines(sm.spline(MyGAM0\$model\$Covariate, response0\$fit - 1.96 * response0\$se) , lty = 3 , lwd = 2 , col = "green") abline(h=gam.dm1\$coefficients[1], lty=2, lwd=1, col="red") abline(h=gam.dm0\$coefficients[1], lty=2, lwd=1, col="green") ```

Update:I have written a much more detailed static page about the additive COX model: http://rforge.org/plothr/
The page has a download link to the function plotHR() which does all the fuzz. It is extensively commented. It should be easy to understand the syntax and modify it for individual purposes.

Therneau et al. refer to the proportional hazards model or COX-regression model as “the workhorse of regression analysis for censored data”. They show how to implement the additive form of this model in SAS and S-pluss; already mentioned by Hastie and Tibshirany in 1986 when introducing Generalized Additive Models (GAM).

I found modelling the functional form of the covariates in a regression model for rightcensored survival times with smoothing splines extremely useful. And the implementation is absolutely straightforward in R.

The only thing needed is the installation of the R-libraries “survival” and “pspline”:

`install.packages("pspline")`
and
`install.packages("survival")`

In the following code I will refer to a dataset “MyData” with a binary status variable “death” and a time-to-event variable “days2death”.
The status variable “death” should be (not necessarily) 1 if the event of interesst occured to the subject and “days2death” gives then the time to this event.

Viualizing the functional form of a covariate takes the following steps:

1. create the survival object of interesst
2. fit a proportional hazards model with smoothing splines,
3. predict the functional form of the covariate of interesst and
4. plot it!

Note that there is the termplot() function in R which gives you the GAM plots after the modelfit, so step 3 would not be necessary – BUT: it has a bug and fails plotting a single covariate; and it does not allow all to much customizing.

This is the R code to achieve the analysis:

1 Create survival object:

`surv.death <- Surv(MyData\$days2death, MyData\$death)`

2 Fit proportional hazards model with smoothing splines for continuous covariates:

```library(survival) library(pspline) pham.fit <- coxph( surv.death ~ pspline(EF, df=4) + pspline(Age, df=4) + strata (Sex, df=4) , data = MyData)```

The model above includes the continuous covariates “EF” (ejection fraction) and “Age” and stratifies for “Sex”.

3 Produce the fitted smoothing spline for the first covariate in the above model formula with standard errors

`predicted <- predict(pham.fit , type = "terms" , se.fit = TRUE , terms = 1)`
“terms=1” refers to “pspline(EF,df=4)”

4 Plot it

First plotting axes and labels
`plot(0 , xlab="Ejection Fraction" , ylab = "Hazard Ratio" , main = "All-cause Death" , type = "n" , xlim=c(0,100) , ylim=c(0,3))`
the range of values on the x-axis (“xlim=c(0,100)”) is chosen manually for this specific covariate; of course it is possible to use something like `ylim = c( 0 , max(MyData\$EF) )`.

Now plot the fitted smoothing spline using the lines() function:
`lines( sm.spline(MyData\$EF , exp(predicted\$fit)) , col = "red" , lwd = 0.8)`
Note that the term prediction gives log-hazard-ratios; therefore exp(predicted\$fit) is plotted against the values of the covariate. The sm.spline() function is necessary since the points of the plot appear in random order and density, according to the underlying dataset; a plain lines() function would produce just a chaotic pattern. Alternative:
`plot(MyData\$EF , exp(predicted\$fit) , col = "red" , cex = 0.2)`
produces a scattered plot that reflects the distribution of the underlying data – I do prefer adding a rug-plot on the bottom of the graph to illustrate this (see under).

… upper and lower confidence limits with dashed thinner lines

`lines(sm.spline(MyData\$EF , exp(predicted\$fit + 1.96 * predicted\$se)) , col = "orange" , lty = 2 , lwd = 0.4)`
and
`lines(sm.spline(MyData\$EF , exp(predicted\$fit - 1.96 * predicted\$se)) , col = "orange" , lty = 2 , lwd = 0.4)`

… a tiny horizontal line at hazard level 1, do see where the confidence limits cross:
`abline( h = 1 , col = "lightgrey" , lty = 2 , lwd = 0.4)`

… tiny tickmarks on the x-axes to reflect the distribution of the underlying data:
`axis( side = 1 , at = MyData\$EF, labels = F , tick = T , tcl = 0.4 , lwd.ticks = 0.1)`

… and some fancy red tickmarks to mark minimum, lower hinge, median, upper hinge and maximum of the covariate in the dataset:
`axis( side = 1 , at = fivenum(MyData\$EF), labels = F , tick = T , tcl = -0.2 , lwd.ticks = 1 , col.ticks = "red")`

Thats it!

4b) The easy way (works ONLY with MORE then 1 continous covariate) – predicting the terms can be omitted:

`termplot(pham.fit, se=T, rug=T)`

Resulting in …