Plot Function For Additive Cox Regression
A Cox-regression model in R is achieved by
model <- coxph ( SuvivalObject ~ Covariate1 + Covariate2 + Factor1 + Factor2 , data = Dataset )
The covariates can be enclosed in other functions:
- factors should be enclosed by factor()
- strata, which allow to adjust for a factor without getting an estimate, should be enclosed by strata()
- non-log linear continuous terms can be enclosed by
In the latter case the model might look like
model <- coxph (SurvivalObject ~ pspline(Covariate1) + Covariate2 + factor(Factor1) + strata(Factor2) , data = Dataset )
The functional form of the covariates (including the factors) can now be plotted with
When I started out writing a plot function for the spline fit, the termplot() function failed for univariate models this is fixed now. So the main incentive to use plotHR() is gone.
On the other side there is a lot of customization possible with plotHR() which is not with termplot() and one my find the syntax interesting to see how the internals of the coxph() function work. I learned at least a lot from writing this.
The function plotHR() plots the functional form of the desired term:
plots the first term in the model by default but other terms can be accessed by calling their number (e.g. the second one):
plotHR(model , terms = 2)
In order to use the function you have to “source” it into R. It is the same procedure as calling a package, but using “source” instead of “library”.
Download plotHR() to your hard disk, remember the path and execute
before using the function. Note the forward slash “/” in the path description in R – using the Windows backslash “\” will fail.
You might also find it useful to open the downloaded file plotHR_0.x.R and have a look inside. I am trying to keep the code as readable as possible and to comment extensively. Feel free to send feedback, bug reports, improvements and of course to adapt it to your needs.
If you need to remember the possible options in the function call after you have “sourced” the function you can also type
in the R-command line without brackets and R will display the syntax of the function.
plotHR( model , terms = 1 , se = TRUE , rug = "density" , xlab = "" , ylab = "Hazard Ratio" , main = NULL , xlim = NULL , ylim = NULL, col.term = "#08519C", lwd.term = 3, col.se = "#DEEBF7", cex = 1 , bty = "n" , axes = TRUE )
- model – a coxph model
- terms – integer; the number of the term to plot
- se – logical TRUE/FALSE; plotting the CI
- rug – “ticks” or “density”; rug plot or density plot at x-axis. Any other value for “rug” will omit the rugplot.
- xlab – character; x-axis label
- ylab – character; y-axis label
- main – character; main plot title
- xlim – 2×1 column vector; x-range of plot
- ylim – 2×1 column vector; y-range of plot
- col.term – color of HR-curve
- lwd.term – line width of HR-curve
- col.se – color of CI (if plotted)
- cex – numeric; size factor of labels
- bty – specifies the boxtype around the plot. See
Note: I have rewritten the function several times since I wrote the initial post.
- V0.11: The density plot of the explanatory variable at the bottom of the panel is rewritten by Max Gordon. He rewrote big parts of the function and added the possibility to plot different models/ subgroup analysis into the same plotting panel right from the function. V0.11 is basically a downscaled fork of his latest version. I am considering to add the full version as V0.12 but I have to get through the code first in order to adapt the “Usage” part.
- V0.9: Mainly rewrite of the density plot. rug = “density” will produce acceptable density plots on the x-axis with marks for 2.5-, 25-, 50-, 75- and 97.5-percentiles. Also the confidence shade is marked at the same percentiles. This will only appear when rug = “density” is specified. The position of the density plot in the main plot panel is still a concern: It should work for one-panel plots. In multi panel plots the density plot will move under the x-axis. This can be manually adjusted by selecting a fitting value for dens.pos to move it up again. For a 1×3-panel I had to use dens.pos = 0.6.
- V0.7 is a complete rewrite.
- The local dataset is trimmed to include only model covariates and complete observations only. The incomplete observations are omitted in the cox model anyway. This removes difficulties which arose from missing covariate values in the dataset.
- x.log option is deprecated – it is planned to use the xlog option from plot.default() on the long run.
- the rug option is extended to FALSE, “density” or “ticks” (the default). The option rug = “density” is still a bit alpha’ish but includes a density plot of the covariate distribution on the x-axis.
- V0.6 – removed the y.log option, since the scale should be logarithmic anyway. Later I will also rewrite the x.log option, since the feature is already incorporated in the plot.default() function. I also removed the dotted line at HR=1 level, since some complained about it overstating the importance of the log(HR) intercept. I included it, since it gives a hint about the significance of the smooth term, in case the confidence intervals cross over the line… Those who miss it can add manually
lines( h = 0 , type = 2 )I rewrote the “rugs” option. Try
rugs = "density"It is still “beta”ish, but some like it.
- V0.5 – bug fix for the y-scale and slight adjustment of the default plotting colors (paler CI shade and stronger term-line)
- V0.4 – the y-scale should be logarithmic; a HR of 0.5 (50% reduced Hazard) should show the same distance from HR = 1 as a doubled Hazard (HR = 2); this is now default. The linear scale I used initially is biased in this concern. The option y.log allows both ways (linear and log-scale); the axis labels denote Hazard Ratio instead of log(HR).
The additive model and the plotting method is based on:
Therneau TM, Grambsch PM: Modeling Survival Data: Extending the Cox Model. Springer 2000.
The relevant part is on page 108ff, chapter 4, Modeling the functional form of covariates.
Therneau is the maintainer of the “survival” package in R. I wonder why the additive modeling of a covariate is not mentioned more explicitly in “library(survival)”.
25 thoughts on “CoxGAM”
Thanks very much for writing this. How do you connect the x and y axes a the origin?
bty = "L"would deliver what you desire (I am not 100% about “L”, you may want to check
?parand look up the bty parameter)
If you want the axis without the space, you have to change the function code. Then you could not use the rug/density plot on the x-axis, since they are sitting in the space which the x-axis is moved downwards…..
I’ve been using these plots and loving them. I am however facing a problem in getting low values (.09) to show up on the axes. Any ideas?
First, sorry for late reply. I got the mail alert of your comment on 05/07 while I see on the date tag of your comment that you posted it in Feburary?! Nevertheless:
I suppose you mean low values on the y-axis, which means HR = 0.09:
You pointed out an important point I missed – although it is possible to specify ylim = c( lower_limit, upper_limit) in plotHR() it is not straightforward. The axis shows the HR, but the function handles log(HR). This said, you would have to do
plotHR( JeffsCoxModel , ylim = c( log(0.09) , log(2)) )
in order to plot on a HR-scale from 0.09 to 2.
I worked a little more with the plotHR adapting it to my needs. You can find the latest version here: http://pastebin.com/v0LZje5F
What I’ve done is to add the possibility of plotting mutliple models in the same plot. As I’ve posted previously I’m struggling with the finding out how to do time dependent coefficients in R and right now I’m chopping up the time and plotting the graphs on top of each-other. It’s not a dream solution but it does the job OK.
I’ve also realized what the waist of the cph() cox regression represents – it is the reference that all the others relate to. If you do a:
abline(h=0, col=rgb(0.5, 0.5, 0.5, 0.5), lty=2)
abline(v=median(data[, term.label]), col=rgb(0.7, 0.7, 0.7. 0.5), lty=2)
you get a cross through the waist. I guess it’s not so strange but now I’m a little confused how the regular coxph() does this without a reference. After all cph() should only be a wrapper around coxph().
Thanks once more for your excellent software 🙂
I noticed some difficulties with the regular coxph() pspline and after some debugging I’ve changed the pastebin (se prev. comment) to a version that works with the following test code:
hmohiv<-read.table("http://www.ats.ucla.edu/stat/R/examples/asa/hmohiv.csv", sep=",", header = TRUE)
surv <- with(hmohiv, Surv(time, censor))
fit <- coxph(surv~ pspline(age) + drug, data=hmohiv)
plotHR(fit, terms="age", bty="l", xlim=c(25, 55))
dd <- datadist(hmohiv) # compute data distribution summary
options(datadist=’dd’) # for plotting
fit <- cph(surv~ rcs(age, 5) + drug, data=hmohiv, x=T, y=T)
plotHR(fit, terms=1, bty="l", xlim=c(25, 55))
fit <- cph(surv~ rcs(age, 5), data=hmohiv, x=T, y=T)
plotHR(fit, terms="age", bty="l", xlim=c(25, 55))
fit <- cph(surv~ rcs(age, 5), data=hmohiv, x=T, y=T)
plotHR(fit, terms="age", bty="l", xlim=c(25, 55), y.ticks=c(1,2,3), ylog=F, rug="ticks")
I’ve also changed the density polygon so that it doesn’t need a second plot but is created in the main canvas – this way you don’t need the dens.pos – the density is always positioned at the x-axis.
Another thing I changed was the ability to add exp() to the spline in case someone prefers that format on the yscale.
Thanks a lot Max! I got no notification from WordPress about your ongoing comment activity, so that all your activity went completely unnoticed. I will dig into your code as soon as possible.
Great that you left a download link!
I am getting the following error:
Error in `[.data.frame`(data, , all.labels) : undefined columns selected
formula1 Cox.Model plotHR(Cox.Model)
Error in `[.data.frame`(data, , all.labels) : undefined columns selected
Might be special characters in variable name. The function uses a regular expression substitution to get rid of the “pspline()” in the coxph() formula. When there are characters which I did not anticipate in the variable name, it gets truncated and afterwards the variable is not found in the dataset… my guess…
Thanks, I will try that!
Hi, I got the exact same error with you when I run this function plotHR(). Have you fixed it later on and what is the problem? Thanks!
Great R package. But what I’d like to do is use it with interactions where one constitutive element of the interaction is a spline.
M1 <- coxph(Surv(begin, end, Partisan_Spell) ~ pspline(POLCONIII) + PC3_BC + BEGINbc + cumulative_crisis + polity2 + Growth + strata(new_count_ps) + cluster(imfcode), robust = TRUE, na.action=na.exclude, data=group2
PC3_BC is the interaction
pspline(POLCONIII) is one constitutive element
BEGINbc is the other constitutive element
Could you provide some advice?
I am quite sure I do not understand your question properly:
PC3_BC and BEGINbc are continuous variables?
I am not sure what you mean by “constitutive element” (English is not my native language and I did not encounter this terminology before)? Are your refering to a continuous effect modifier for the covariate PC3_BC? If so it might be that the continuous interaction is also not linear, so you would end up with a spline on a two-dimensional space. As far as I know the pspline() in the survival package is very basic. I know that you can do this with Wood’s mgcv package. Here you can do logistic regression and have non-linear interactions visualized as a smoothing surface over two interacting variables and plot the surface with vis.gam().
Thanks. BEGINbc is a binary variable, POLCONIII is continuous. I’m able to plot the effect using gam package, but was hoping there was a way to do it directly from cox model.
Thanks for your beautiful plot, and I wanna draw a plot like this one.
but my cox model is :
mod<-coxph(Surv(start, stop, death)~ blood+ dis +blood:dis,data=data)
so there is an interaction between continuous variable "blood" and categorical variable "dis" (1 or 0, it means disease).
formula for above code is y=coef1*blood+coef2*dis+coef3*blood:dis
coef3 is constant after fitting the cox model, it means the effect of blood is exp(coef3) times among ppl with dis compared to ppl without dis.
if blood is 0.3, effect of blood is exp(coef3*0.3) times higher in dis vs no dis;
if blood is 0.4, effect of blood is exp(coef3*0.4) times higher in dis vs no dis;
if blood is 0.5, effect of blood is exp(coef3*0.5) times higher in dis vs no dis;
as blood is continues, how can I make a plot with x-axis as blood value, y-axis is the hazard ratio : exp(coef3*blood value) using your code?
You would not need to plot it. The plot is only interesting if you are out for non(log)linear effects of the covariate. In your case blood is linear and disease is categorical and also the interaction is categorical. If you plot this you get two linear graphs with different slope and the quotient of the slopes is coef3.
The termplot() function does print linear relationships, i.e. lines for coxmodels, but I am quite sure it does not handle interaction terms. You can try and comment back if you found out about this one. You could follow the blog post and manually create plots of the smoothing splines with the coxmodels in subgroups,
m0 <- coxph(Surv(start, stop, death) ~ pspline(blood) , data = data , subgroup = dis == 0)
m1 <- coxph(Surv(start, stop, death) ~ pspline(blood) , data = data , subgroup = dis == 1)
This way you can get an idea about what is going on, but you get no test about the significane of the interaction term.
Hello , thanks for your reply. My data set is counting process data. the ” blood” is time dependent variable which changes every week for each patients during the follow up period, why you said it is linear variable? I am quite new in R, in my interaction model, the interaction of time dependent variable ” blood” and “dis” is significant, If I plot the interaction, maybe I will get two curve( not linear graph) for dis 1 and dis 0, and the ratio of two points in two curves (when at same blood value,i.e same x-axis value) is constant of exp(coef3), right?
Thanks a lot, I am happy to discuss this with you.
I have a dataset with 2 treatments and want to assess the effect of a continous covariate on the Hazard ratio between treatment A and B. Is this something that plotHR can do simply?
Many thanks for any help in advance.
That is not easily done 😉
I get the point, but never did it myself. You want a smoothed interaction term, which is very reasonable. I have never seen anything like it. I would not exclude the possibility, that it is possible in coxph().
This is more a question of modelling. When the pspline(continuous)*dochotmous comes up with something meaningful, then it might be possible to plot it. Try the r-help email list and look for comments from the heavyweights in the field, Therneau and Woods I would think.
If you look at ?termplot in R you will see, that termplot() as the default plotting function for such estimates simply ignores the interation term and might give an error.
plotHR() as it is right now also ignores interaction terms.
… when thinking about it: As you pose your question my first answer is valid, but as a workaround in the direction of your thinking one can plot a smooth for the continuous covariate in subgroups of Treatment = “A” and Treatment = “B”. This is not the same, since you do not get HR of A vs B dependent on Cov but you might get an idea what is going on.
Hi, I want to plot the HRs vs. a particular continuous variable, which I think plot HR could give me. However, I am still not very clear about the methods that behind the plotHR function. Based on my understanding the HR should be constant if we are using cox model. But I got curves of HR and I want to know how it works. It would be great if you could explain a little bit.
model <- coxph(Surv(tmb_os_pd1drug_269all_1$Survival_months.y, tmb_os_pd1drug_269all_1$Status.y)~rcs(TMB_log,3),data = tmb_os_pd1drug_269all_1)
plotHR(model, terms=1, se = 'TRUE' ,xlim=c(0,2), ylim=c(-1.5,1),xlab = 'logTMB',main = 'logTMB distribution against Hazard Ratio')
Check the monograph I linked (Grampsch/Therneau: Extending the Cox Model, p.108ff I think). In the same chapter there is an example how to fit a cox model with a very slow counting process GLM – here you can do even more fancy stuff like partial hazard surfaces over pairs of covariates.
The proportional hazard assumption has nothing to do with the hazard ratio between different levels of the explanatory variable. The y-axis is a bit sloppy defined by ‘hazard ratio’ it is more ‘partial (log) hazard’ and the hazard ratio would be the slope of the segment between two point on the GAM curve.
The plotHR function is a simple wrapper around the standard `termplot()` function. At the time of writing the blog (long ago) it had a bug and would not work for only one covariate. This is fixed now. The other reason was that I wanted some more pleasant visuals (for my taste at least).
The main estimation is using the `pspline()` function in the formula of the `coxph()` model.
So `coxph( Y ~X )` would give you a linear fit (with, yes, constant hazard ratio per unit increase of X,
while `coxph( Y ~ pspline(X))` estimates a smooth function of the partial hazard over units of X.
Then `termplot()` just gives you a visual representation (try plotting the standard/linear case). Then you will see the difference.
ah, you are using `library(rms)` the same is valid only that the spline function is different.