The lightblue shade denoting the 95% pointwise confidence limits of the GAM estimate is a polygon() object in R.
Usually one would plot the GAM model with the default termplot() function and specifiy se=T to get the confidence limits as lines. I showed in a recent post how to plot the fitted GAM smooth and the confidence limits manually.
The same approach can be used to have the confidence limits as a shade. In order to achieve this:
- Fit the model
- get the estimated values of the fitted smoothing spline
- and pointwise standard errors
- calculate the values of the upper and lower 95%-confidence limits
- plot an empty coordinate system with right scaling, labels , titles and so on. I prefer to include axes = FALSE and add the axes manually. See here for an example.
- No it gets a bit tricky with sorting the coordinates in the right order. R provides a very effective way to achieve this, though it is not easily understandable at once. We create two indices on the dataset i.for and i.back which number the dataset according to the independend variable on the x-axis – forward and backward respectively:
- The points of the shade polygon follow here first the upper confidence line (ucl) forward and then the lower confidence line backward. The coordinates of the shade-polygon are
- First plot the polygon with the confidence shade – otherwise the GAM plot vanishes behind it.
- now the mainplot afterwards
- add a horizontal line marking the intercept ( the mean of the covariate) – note: univariate only. I will post a remark on this later…
library(mgcv)
model <- gam( MyResponse ~ MyCovariate , data = MyDataset)
fit <- predict( model , se = TRUE )$fit
se <- predict( model , se = TRUE)$se.fit
lcl <- fit - 1.96 * se
ucl <- fit + 1.96 * se
plot( 0 , type = "n" , bty = "n" , xlab = "The x-axis lable" , ylab = "The y-axis lable" , main = "The Title" , xlim = c(0,5) , ylim = c(0,3) )
i.for <- order( MyDataset$MyCovariate )
i.back <- order( MyDataset$MyCovariate , decreasing = TRUE )
x.polygon <- c( MyDataset$MyCovariate[i.for] , MyDataset$MyCovariate[i.back] )
y.polygon <- c( ucl[i.for] , lcl[i.back] )
polygon( x.polygon , y.polygon , col = "#A6CEE3" , border = NA )
lines( MyDataset$MyCovariate[i.for] , fit[i.for], col = "#1F78B4" , lwd = 3 )
abline( h = mean(MyDataset$MyCovariate) , lty = 2 )
Hope no typos. Comment if you find some.
Thanks. I had great fun with this, getting a lot of impressive plots. I think there is a typo in step 7, line 01; there should be a closing “)”
Absolutely. Should be correct now.
Thanks for posting it.
I really appreciate the fact that you took the time to make these kinds of plots and walk people through them…I have been trying to figure it out for a few days now, and finally, this works for my data.
I did wonder if you could help me, though…I did get the line graphed against the response variable in the y-axis, but the smoothing effect is gone, and I just have a pointy line that is graphed. Any ideas?
Thanks!
Mmh, that is rather odd, since plotHR should not plot any pointy line, but solid lines and a polygon shade for the 95% confidence limits…
You might try
termplot( coxph( Your_Formula ) , se = TRUE , rug = TRUE )
instead of
plotHR( coxph( Your_Formula ) )
termplot() is the default R plotting method for the functional form of the covariates… I just gave it another look and the maintainers have fixed the bug which prevented plotting of a univariate model. That was the main reason I started out with plotHR()… and it renders plotHR more or less needless …. although I do prefer the default colors and CI shades of plotHR()… and the x-axis rug customization with 1Q, median, 3Q… 😉
You might try
termplot( coxph( YourFormula ) , se = TRUE , rug = TRUE , axes = FALSE , bty = "n" )
axis( side = 1 )
axis(side = 2 , at = axTicks(2) , label = round( exp(axTicks(2)) , digits = 1) )
abline( h = 0 , lty = 3 )
If that works, your model is allright and there might be a problem with plotHR()… in which case I would gladly have some feedback.
Good luck,
Reinhard
Excellent instructions! You’re sample code, explanations, and example graph really made this easy to understand. Thank you very much!
Very helpful code! Thanks!
That’s so great! It helped me a lot! Thank you so much.
Really helpful presentation. Clear and easy. suitable for time series as well. Thank you very much!!!
Excellent post, Congratulations!
Regards from Brazil!
Excellent! It works prety well. But, how can plot two polygons (CI) for two fitted models in the same graph? I tried it by usual way, but it did not work.
Any ideas?
Thanks
Sorry for the last post, I have already solved it. I have found a mistake in my setting. So it works.
brgds
What if I have
model <- gam( MyResponse ~ MyCovariate1+MyCovariate2 , data = MyDataset)
would you proceed the same way?
Note that you forgot the smooth function. You are fitting a linear regression with the gam() function and will get the same output as lm().
model <- gam( MyResponse ~ s(MyCovariate1) + s(MyCovariate2) , data = MyDataset)
will give you independent smooth estimates for both covariates.
Within the gam call could one include some variables to be treated as linear and others treated as non-linear eg:
model <- gam( MyResponse ~ MyCovariate1 + s(MyCovariate2) , data = MyDataset)
Where "MyCovariate1" can be treated linear and "MyCovariate2" can be treated as non-linear.
Yes, that is no problem if you would do something like
gam( y ~ x1 + x2 , data = MyData )
the results would equal
lm( y ~ x1 + x2 , data = MyData )
Thanks, I meant something like this:
gam(y ~ s(x1) + s(x2) + factor(x3) + factor(x4))
having predictors that are continuous and predictors that are categorical…is it possible to do that and still have meaningful results?
Yes, that is completely fine. I wanted to express, that you can in extreme even drop the smooth estimates and the results are meaningful and consistent with lm(). You can mix functional, continuous and factorial estimates. The summary() function will give you the usual beta estimates for the linear coefficients.
Sorry, late reply, did not get any email notification.
It’s a miracle – someone who can explain R code understandably! Thanks a bunch. This worked wonderfully.
hehe, np…
Thanks for useful stuff. I am trying to run my model following your post. I assumed both linear and non-linear covariates.
model <- gam( Y~s(X1)+s(X2)+X3+X4+X5), data = mydata)
However I could not get smooth curve. Your code works for model <- gam( Y~s(X1)), data = mydata)
I am new with R and appreciate your comments.
Best,
Dar
You might not need to use so much time on the manual procedure to produce the plots. It is quite some time since I wrote this post (2009!).
there is a plot method included in the mgcv package see
?plot.gam
and you can come quite far with it. Try:
plot(
gam( Y ~ s(X1)+s(X2)+… , data = mydata ) ,
shade = TRUE ,
pages = 1 )
To get all terms in a multipanel plot, or
plot(
gam( Y ~ s(X1)+s(X2)+… , data = mydata ) ,
shade = TRUE ,
terms = 1 )
to get a single panel plot of the first term X1.
If you really want to go through the manual fuzz of producing the smooth for multiple covariates, you have to look at the options of
?predict.gam
You will find that you can get predicted values on the response scale by using a “newdata” dataframe. This dataframe has to include a sequence of X1 values from min(X1) to max(X1) and a reasonable value for all the other Xn at which the response to X1 is predicted. The most obvious choice would be the mean of the Xn, n!=1.
Then you can predict the response Y for each line of the new dataset and plot Y over X1. When you construct newdata such that X1 is ascending you do not even sort anything.
Then I do not recall how the prediction object looks like when predicting from a multi-covariate model. …
Anyway, to repeat my initial point: I would really wonder if the rich options of plot.gam() of Woods himself would not get the job done.
Good luck and feel free to come back. I have some code lying around doing this, but did not use it for ages…
I appreciate your prompt response Sir,
I am going through your instructions plot.gam. Hope I will get a solution and get back you with my results. Anyway, When I am running semi-parametric model (Y=s(x1)+s(x2)+x3+x4), is it possible to get estimated coefficients for x3 and x4?
Best,
Dar