R function to transform continuous variable to categorical factor cut at n-tiles

The cut() function can be used to transform a continuous variable into a categorical factor variable. The syntax is quite lengthy and if one wishes to cut at quartiles, quintiles or other n-tiles one has to include the quantile() function into the call.

This is not very newbee friendly and if included into a model-call nearly unreadable.

The function in the code box cutN() does the job.

 cutN <- function(X , n = 4){
         X ,
         include.lowest = TRUE ,
         breaks = quantile(
                           X , 
                           probs = (0:n)/n ,
                           na.rm = TRUE ))}

In order to cut the continuous variable Creatinine in the dataset Patients into deciles (n=10) the syntax is:
cutN( Patients$Creatinine , n = 10 )

No big deal, but maybe useful…


Generate LaTeX tables of descriptive statistics

Using a LOT of time on fiddling together tables of descriptive statistics manually in R did not inspire me to look in the CRAN repositories for a R function doing exactly this… yes a bit stubborn… want to find out myself…

So what I had to do was:

  1. handcode the descriptives for each variable, something like prop.table(xtabs( ~ Variable + Group1 + Group2), c(2,3)),
  2. calculate some statistics for each variable (chi^2, one-way-ANOVA, Fisher-exact..) and attach it to the descriptives,
  3. run the code,
  4. copy the plain text output from R back into the editor and finally
  5. add LaTeX or CSV syntax to get the desired results.

Anyway this is over now. A solution exists, of course (thx Kjetil… again):

The package ‘reporttools’ of Kaspar Rufibach.
install.packages("reporttools", dependecies=TRUE)

The most important functions are well:

tableContinuous( vars = c(bmi, ejectionfraction, systolicBP, diastolicBP) , group = sex , subset = significantstenosis , print.pval = "anova")


tableNominal() for nominal variables.

The output is LaTeX and it is possible to specify table captions and lables to the tables in the function call. I will give it a try inside Sweave… but first I have to get it to my Linux machine … which is blocked by the corporate firewall … after a recent Mircrosoft powergrab … after a Virusattack (Conficker) … after running XP unpached.

But thats another story

Additive COX-regression

Update:I have written a much more detailed static page about the additive COX model: http://rforge.org/plothr/
The page has a download link to the function plotHR() which does all the fuzz. It is extensively commented. It should be easy to understand the syntax and modify it for individual purposes.

Therneau et al. refer to the proportional hazards model or COX-regression model as “the workhorse of regression analysis for censored data”. They show how to implement the additive form of this model in SAS and S-pluss; already mentioned by Hastie and Tibshirany in 1986 when introducing Generalized Additive Models (GAM).

I found modelling the functional form of the covariates in a regression model for rightcensored survival times with smoothing splines extremely useful. And the implementation is absolutely straightforward in R.

The only thing needed is the installation of the R-libraries “survival” and “pspline”:


In the following code I will refer to a dataset “MyData” with a binary status variable “death” and a time-to-event variable “days2death”.
The status variable “death” should be (not necessarily) 1 if the event of interesst occured to the subject and “days2death” gives then the time to this event.

Viualizing the functional form of a covariate takes the following steps:

  1. create the survival object of interesst
  2. fit a proportional hazards model with smoothing splines,
  3. predict the functional form of the covariate of interesst and
  4. plot it!

Note that there is the termplot() function in R which gives you the GAM plots after the modelfit, so step 3 would not be necessary – BUT: it has a bug and fails plotting a single covariate; and it does not allow all to much customizing.

This is the R code to achieve the analysis:

1 Create survival object:

surv.death <- Surv(MyData$days2death, MyData$death)

2 Fit proportional hazards model with smoothing splines for continuous covariates:

pham.fit <- coxph( surv.death ~ pspline(EF, df=4) + pspline(Age, df=4) + strata (Sex, df=4) , data = MyData)

The model above includes the continuous covariates “EF” (ejection fraction) and “Age” and stratifies for “Sex”.

3 Produce the fitted smoothing spline for the first covariate in the above model formula with standard errors

predicted <- predict(pham.fit , type = "terms" , se.fit = TRUE , terms = 1)
“terms=1” refers to “pspline(EF,df=4)”

4 Plot it

First plotting axes and labels
plot(0 , xlab="Ejection Fraction" , ylab = "Hazard Ratio" , main = "All-cause Death" , type = "n" , xlim=c(0,100) , ylim=c(0,3))
the range of values on the x-axis (“xlim=c(0,100)”) is chosen manually for this specific covariate; of course it is possible to use something like ylim = c( 0 , max(MyData$EF) ).

Now plot the fitted smoothing spline using the lines() function:
lines( sm.spline(MyData$EF , exp(predicted$fit)) , col = "red" , lwd = 0.8)
Note that the term prediction gives log-hazard-ratios; therefore exp(predicted$fit) is plotted against the values of the covariate. The sm.spline() function is necessary since the points of the plot appear in random order and density, according to the underlying dataset; a plain lines() function would produce just a chaotic pattern. Alternative:
plot(MyData$EF , exp(predicted$fit) , col = "red" , cex = 0.2)
produces a scattered plot that reflects the distribution of the underlying data – I do prefer adding a rug-plot on the bottom of the graph to illustrate this (see under).

… upper and lower confidence limits with dashed thinner lines

lines(sm.spline(MyData$EF , exp(predicted$fit + 1.96 * predicted$se)) , col = "orange" , lty = 2 , lwd = 0.4)
lines(sm.spline(MyData$EF , exp(predicted$fit - 1.96 * predicted$se)) , col = "orange" , lty = 2 , lwd = 0.4)

… a tiny horizontal line at hazard level 1, do see where the confidence limits cross:
abline( h = 1 , col = "lightgrey" , lty = 2 , lwd = 0.4)

… tiny tickmarks on the x-axes to reflect the distribution of the underlying data:
axis( side = 1 , at = MyData$EF, labels = F , tick = T , tcl = 0.4 , lwd.ticks = 0.1)

… and some fancy red tickmarks to mark minimum, lower hinge, median, upper hinge and maximum of the covariate in the dataset:
axis( side = 1 , at = fivenum(MyData$EF), labels = F , tick = T , tcl = -0.2 , lwd.ticks = 1 , col.ticks = "red")

Fancy customized smoothing spline fitted to the functional form of a covariate in a additive proportional hazard model
Fancy customized smoothing spline fitted to the functional form of a covariate in a additive proportional hazard model

Thats it!

4b) The easy way (works ONLY with MORE then 1 continous covariate) – predicting the terms can be omitted:

termplot(pham.fit, se=T, rug=T)

Resulting in …

The default termplot method for fitted smoothing splines
The default termplot method for fitted smoothing splines