Remove Color Channels from TIFF Using Command Line

The R function

tiff()

is used a lot to export R graphics e.g. for publication purposes. The tiff includes all 8 color channels and this collides with formatation requirements of most jounals (1200dpi but not more than some MB in size).

tiff("Whiskerplot.tiff" , width = 8.8 , height = 8.8 , units = "cm" , pointsize = 8 , res = 1200)
produces the required 1200 dpi resolution for monochrome plots, but with all 8 color channels (although just one is necessary) this results in 49.4MB for a tiny 8.8×8.8 cm (ca 3.5×3.5 inch) plot. I used GIMP to remove the color channels with

Image > Mode > Indexed > 1-bit monochrome

but this required opening GiMP, the file, the menu and saving. And again, when the plot had to be modified.

Since this has to be done for each and every publication figure I looked up a command line solution and here it is:
convert Whiskerplot.tiff -flatten -monochrome Whiskerplot_monochrome.tiff
removes all unnecessary colorchannels from Whiskerplot.tiff and saves it as Whiskerplot_monochrome.tiff, reducing the size from 49.4MB to 2.1MB while the result is exactly the same.

Hattip: Paddy Landau on ubuntuforums

Advertisement

R function to transform continuous variable to categorical factor cut at n-tiles

The cut() function can be used to transform a continuous variable into a categorical factor variable. The syntax is quite lengthy and if one wishes to cut at quartiles, quintiles or other n-tiles one has to include the quantile() function into the call.

This is not very newbee friendly and if included into a model-call nearly unreadable.

The function in the code box cutN() does the job.

 cutN <- function(X , n = 4){
     cut(
         X ,
         include.lowest = TRUE ,
         breaks = quantile(
                           X , 
                           probs = (0:n)/n ,
                           na.rm = TRUE ))}

In order to cut the continuous variable Creatinine in the dataset Patients into deciles (n=10) the syntax is:
cutN( Patients$Creatinine , n = 10 )

No big deal, but maybe useful…

Mathematical expressions in R plot

Mathematical anotations to R plots can be formated LaTeX style with the expression function. The expression() can be included in

  • plot titles
    title( expression(...))
  • axis anotations
    plot( ... , xlab = expression(...))

    or

  • the plot panel itself
    text( x , y , expression(...))

An example:

plot( 0 , 0 , type = "n" , xlab = expression( "Nothing" * (mu*mol/l)) )
text( 0 , 0 , expression(beta>=0.2))
title( expression( "Only an expression() demo " * theta^(2*pi)))

It is even possible to update the plot anotation from a variable (found on the R forum)

[R] Expression in plot text

Roger Koenker roger at ysidro.econ.uiuc.edu
Wed Dec 6 20:22:05 CET 2000

expression(paste(hat(theta),'= ',that)))

We’ve been here before. To get an expression either use parse on your
pasted character string or substitute on an expression. There are worked
examples in the list achives. The neatest is

title(substitute(hat(theta) == that, list(that=that)))

(note it is == not =)

Plot Function For Additive Cox Proportional Hazard Regression

Note: This initial blog post is discontinued. If you want to have the latest development of the function have a look at the static page PlotHR

Usually a Cox-regression is achieved in R by
library(survival)
model <- coxph ( SuvivalObject ~ Covariate1 + Covariate2 + Factor1 + Factor2 , data = Dataset )

The covariates can be enclosed in other funtions:

  • factors should be enclosed by factor()
  • strata, which allow to adjust for a factor without getting an estimate, should be enclosed by strata()
  • non-log linear continuous terms can be enclosed by
    pspline()

In the latter case the model might look like
model <- coxph (SurvivalObject ~ pspline(Covariate1) + Covariate2 + factor(Factor1) + strata(Factor2) , data = Dataset )

The functional form of the covariates (including the factors) can now be plotted with
termplot(model)

Though the termplot() function fails with plotting just one covariate and leaves no cusomization.

The function plotHR() plots the functional form of the desired term: plotHR(model)
plots the first term in the model by default but other terms can be accessed by calling their number (e.g. the second one):
plotHR(model , terms = 2)

In order to use the function you have to “source” it into R. It is the same procedure as calling a package, but using “source” instead of “library”.

Paste the function syntax into a textfile and safe it (as plot.HR.R) on your harddisk, remember the path and include
source("C:Path/to/plotHR_0.6.R")
before using the function.

Download plotHR()

Note: I have rewritten the function several times since I wrote the initial post … using version numbers now …

  • V0.6 – removed the y.log option, since the scale should be logaritmic anyway. Later I will also rewrite the x.log option, since the feature is already incorporated in the plot.default() function. I also removed the dottet line at HR=1 level, since some complained about it overstating the importance of the log(HR) intercept. I included it, since it gives a hint about the significance of the smooth term, in case the confidence intervalls cross over the line… Those who miss it can add manually lines( h = 0 , type = 2 )

    I rewrote the “rugs” option. Try rugs = "density" It is still “beta”ish, but some like it.

  • V0.5 – bug fix for the y-scale and slight adjustment of the default plotting colors (paler CI shade and stronger term-line)
  • V0.4 – the y-scale should be logarithmic; a HR of 0.5 (50% reduced Hazard) should show the same distance from HR = 1 as a doubled Hazard (HR = 2); this is now default. The linear scale I used initially is biased in this concern (Hat-tip: Arve Ulvik, Eva Pedersen and Roy Nilsen). The option y.log allows both ways (linear and log-scale); the axis labels denote Hazard Ratio instead of log(HR).

Usage:
plotHR( model , terms = 1 , se = TRUE , rug = "ticks" , x.log = FALSE , xlab = "" , ylab = "Hazard Ratio" , main = NULL , xlim = NULL , ylim = NULL, col.term = "#08519C", lwd.term = 3, col.se = "#DEEBF7", cex = 1 , bty = "n" , axes = TRUE )

  • model – a coxph model
  • terms – integer; the number of the term to plot
  • se – logical TRUE/FALSE; plotting the CI
  • rug – “ticks” or “density”; rug plot or density plot at x-axis. Any other value for “rug” will omit the rugplot.
  • x.log – logical TRUE/FALSE; log-transformed exposure variable
  • xlab – character; x-axis label
  • ylab – character; y-axis label
  • main – character; main plot title
  • xlim – 2×1 column vector; x-range of plot
  • ylim – 2×1 column vector; y-range of plot
  • col.term – color of HR-curve
  • lwd.term – line width of HR-curve
  • col.se – color of CI (if plotted)
  • cex – numeric; size factor of labels
  • bty – specifies the boxtype around the plot. See ?plot.default

Installation of Package ‘rggobi’ requires GGobi

Installation of package ‘rggobi’ had non-zero exit status on my way to solve all the rattle dependencies.

rggobi is an interface from R to GGobi – an open source visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots. Plots are interactive and linked with brushing and identification (from the package description).

When trying to install rggobi R complained accordingly error: Package requirements (ggobi >= 2.1.6) were not met: No package 'ggobi' found

and the fix is not hard to figure out:
sudo aptitude install ggobi

rggobi also depends on RGtk2 which in turn requires libglade. That was another post.

libglade not found – Installation of package ‘RGtk2’ had non-zero exit status

Both R-packages rattle and rggobi depend on RGtk2. Trying to install RGtk2 threw an error: WARNING: libglade not found

Fix:
sudo aptitude install libglade2-dev
installs the development files for libglade, which allows to load externally stored user interfaces into programs. This development file is needed for the graphical user interfaces of both rggobi and rattle.

Matrix Operations in R

R can do all kinds of matrix calculations, like multiplication, tranposing and calculating the inverse. The following manual was created by Phil Ender.

Note: R wants the data to be entered by columns starting with column one.

The Matrix

# the matrix function
# 1st arg: c(2,3,-2,1,2,2) the values of the elements filling the columns
# 2nd arg: 3 the number of rows
# 3rd arg: 2 the number of columns

> A <- matrix(c(2,3,-2,1,2,2),3,2)
> A

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

Is Something a Matrix

> is.matrix(A)

[1] TRUE

> is.vector(A)

[1] FALSE

Multiplication by a Scalar

> c <- 3
> c*A

[,1] [,2]
[1,]    6    3
[2,]    9    6
[3,]   -6    6

Matrix Addition & Subtraction

> B <- matrix(c(1,4,-2,1,2,1),3,2)
> B

[,1] [,2]
[1,]    1    1
[2,]    4    2
[3,]   -2    1

> C <- A + B
> C

[,1] [,2]
[1,]    3    2
[2,]    7    4
[3,]   -4    3

> D <- A - B
> D

[,1] [,2]
[1,]    1    0
[2,]   -1    0
[3,]    0    1

Matrix Multiplication

> D <- matrix(c(2,-2,1,2,3,1),2,3)
> D

[,1] [,2] [,3]
[1,]    2    1    3
[2,]   -2    2    1

> C <- D %*% A
> C

[,1] [,2]
[1,]    1   10
[2,]    0    4

> C <- A %*% D
> C

[,1] [,2] [,3]
[1,]    2    4    7
[2,]    2    7   11
[3,]   -8    2   -4

> D <- matrix(c(2,1,3),1,3)
> D

[,1] [,2] [,3]
[1,]    2    1    3

> C <- D %*% A
> C

[,1] [,2]
[1,]    1   10

> C <- A %*% D

Error in A %*% D : non-conformable arguments

Transpose of a Matrix

> AT <- t(A)
> AT

[,1] [,2] [,3]
[1,]    2    3   -2
[2,]    1    2    2

> ATT <- t(AT)
>ATT

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

Common Vectors

Unit Vector

> U <- matrix(1,3,1)
> U

[,1]
[1,]    1
[2,]    1
[3,]    1

Common Matrices

Unit Matrix

Using Stata

> U <- matrix(1,3,2)
> U

[,1] [,2]
[1,]    1    1
[2,]    1    1
[3,]    1    1

Diagonal Matrix

> S <- matrix(c(2,3,-2,1,2,2,4,2,3),3,3)
> S

[,1] [,2] [,3]
[1,]    2    1    4
[2,]    3    2    2
[3,]   -2    2    3

> D <- diag(S)
> D

[1] 2 2 3

> D <- diag(diag(S))
> D

[,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    2    0
[3,]    0    0    3

Identity Matrix

> I <- diag(c(1,1,1))
> I

[,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Symmetric Matrix

> C <- matrix(c(2,1,5,1,3,4,5,4,-2),3,3)
> C

[,1] [,2] [,3]
[1,]    2    1    5
[2,]    1    3    4
[3,]    5    4   -2

> CT <- t(C)
> CT

[,1] [,2] [,3]
[1,]    2    1    5
[2,]    1    3    4
[3,]    5    4   -2

Inverse of a Matrix

> A <- matrix(c(4,4,-2,2,6,2,2,8,4),3,3)
> A

[,1] [,2] [,3]
[1,]    4    2    2
[2,]    4    6    8
[3,]   -2    2    4

> # using MASS package

> library(MASS)

> AI <- ginv(A)
> AI

[,1] [,2] [,3]
[1,]  1.0 -0.5  0.5
[2,] -4.0  2.5 -3.0
[3,]  2.5 -1.5  2.0

> # using car package

> library(car)

> AI <- inv(A)
> AI

[,1] [,2] [,3]
[1,]  1.0 -0.5  0.5
[2,] -4.0  2.5 -3.0
[3,]  2.5 -1.5  2.0

> A %*% AI

[,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

> AI %*% A

[,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Inverse & Determinant of a Matrix

> C <- matrix(c(2,1,6,1,3,4,6,4,-2),3,3)
> C

[,1] [,2] [,3]
[1,]    2    1    6
[2,]    1    3    4
[3,]    6    4   -2

> CI <- inv(C)
CI

[,1]        [,2]        [,3]
[1,]  0.2156863 -0.25490196  0.13725490
[2,] -0.2549020  0.39215686  0.01960784
[3,]  0.1372549  0.01960784 -0.04901961

> d <- det(C)
> d

[1] -102

Number of Rows & Columns

> X <- matrix(c(3,2,4,3,2,-2,6,1),4,2)
> X

[,1] [,2]
[1,]    3    2
[2,]    2   -2
[3,]    4    6
[4,]    3    1

> dim(X)

[1] 4 2

> r <- nrow(X)
> r

[1] 4

> c <- ncol(X)
> c

[1] 2

Computing Column & Row Sums

# note the uppercase S

> A <- matrix(c(2,3,-2,1,2,2),3,2)
> A

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

> c <- colSums(A)
> c

[1] 3 5

> r <- rowSums(A)
> r

[1] 3 5 0

> a <- sum(A)
> a

[1] 8

Computing Column & Row Means

# note the uppercase M

> cm <- colMeans(A)
> cm

[1] 1.000000 1.666667

> rm <- rowMeans(A)
> rm

[1] 1.5 2.5 0.0

> m <- mean(A)
> m

[1] 1.333333

Horizontal Concatenation

> A
> A

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

> B <- matrix(c(1,3,2,1,4,2),3,2)
> B

[,1] [,2]
[1,]    1    1
[2,]    3    4
[3,]    2    2

> C <- cbind(A,B)
> C

[,1] [,2] [,3] [,4]
[1,]    2    1    1    1
[2,]    3    2    3    4
[3,]   -2    2    2    2

Vertical Concatenation (Appending)

> C <- rbind(A,B)
> C

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2
[4,]    1    1
[5,]    3    4
[6,]    2    2

Densityplot Variations

Variation I

Densityplot with filled area, quartiles and mean
Densityplot with filled area, quartiles and mean

Just playing around the other day to get the default plot.density() function a bit more like publishing quality. Above is my favorite so far. Further down are two more spartanic versions.

In this connection I also wrote my first function. When I get more customed to R I will package my ideas into an R library – but later…

There is the possibility to call R-functions without embedding the code into your analysis script, but I did not look that up yet, so embedding the following code into your script (at the beginning) and then using
densityplot(Dataset$Coviate)
will do the job, where Dataset and Covariate have to be replaced by the according values of course.

Now the function code:
# function densityplot
densityplot <- function(x , digits = 1 , xlab = "" , ylab = "" , main = "Density" , col = "blue"){
dens <- density(x , na.rm = T)
bins <- as.numeric(cut(dens$x , breaks = fivenum(x)))
i1 <- bins == 1 & !is.na(bins)
i2 <- bins == 2 & !is.na(bins)
i3 <- bins == 3 & !is.na(bins)
i4 <- bins == 4 & !is.na(bins)
x.p1 <- dens$x[i1]; x.p1 <- c(x.p1,max(x.p1),min(x.p1))
x.p2 <- dens$x[i2]; x.p2 <- c(x.p2,max(x.p2),min(x.p2))
x.p3 <- dens$x[i3]; x.p3 <- c(x.p3,max(x.p3),min(x.p3))
x.p4 <- dens$x[i4]; x.p4 <- c(x.p4,max(x.p4),min(x.p4))
y.p1 <- dens$y[i1]; y.p1 <- c(y.p1,0,0)
y.p2 <- dens$y[i2]; y.p2 <- c(y.p2,0,0)
y.p3 <- dens$y[i3]; y.p3 <- c(y.p3,0,0)
y.p4 <- dens$y[i4]; y.p4 <- c(y.p4,0,0)
plot(dens , type = "n" , axes = F, main = main, xlab = xlab , ylab = ylab)
polygon(x.p1,y.p1 , border = F , col = col)
polygon(x.p2,y.p2 , border = F , col = col)
polygon(x.p3,y.p3 , border = F , col = col)
polygon(x.p4,y.p4 , border = F , col = col)
axis(side = 1 , line = 0 , at = fivenum(x, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6)
axis(side = 1 , line = 1 , at = fivenum(x, na.rm = T))
axis(side = 1 , line = 1 , at = round(mean(x , na.rm = T) , digits = digits) , tcl = 0.4 , label = F)
axis(side = 1 , line = -1.5 , at = round(mean(x , na.rm = T) , digits = digits) , tick = F , cex.axis = 0.6)
axis(side = 1 , line = -2.0 , at = round(mean(x , na.rm = T) , digits = digits) , label = "Mean" , tick = F , cex.axis = 0.6)
}

Variation II

Some might find the colored area to much, although IMHO it puts the focus on the fact that one is looking at areas when ploting a density. But then something similar without the color fill. Not a function just a few lines of code to embed and adjust to the script:

Densityplot with quartiles and mean
Densityplot with quartiles and mean

… and the source…
plot(density(angio$PE_ALDER , na.rm = T), axes = F, main = "Basic densityplot", xlab = "" , ylab = "")
# Add Quartiles
axis(side = 1 , line = 1 , at = fivenum(angio$PE_ALDER, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6)
axis(side = 1 , line = 2 , at = fivenum(angio$PE_ALDER, na.rm = T))
abline(v = fivenum(angio$PE_ALDER, na.rm = T)[2:4] , lty = 3)
# Mean
axis(side = 1 , line = 2 , at = round(mean(angio$PE_ALDER , na.rm = T) , digits = 2) , tcl = 0.4 , label = F)
axis(side = 1 , line = -0.5 , at = round(mean(angio$PE_ALDER , na.rm = T) , digits = 2) , tick = F)
axis(side = 1 , line = -1.4 , at = round(mean(angio$PE_ALDER , na.rm = T) , digits = 2) , label = "Mean" , tick = F , cex.axis = 0.6)
abline(v = mean(angio$PE_ALDER , na.rm = T) , lty = 4)

Variation III

… finally an even more stripped down version without the mean:
Densityplot1
which was achieved by:
plot(density(angio$PE_ALDER , na.rm = T), axes = F, main = "Basic densityplot", xlab = "Age")
abline(v = fivenum(angio$PE_ALDER, na.rm = T)[2:4] , lty = 3)
axis(side = 1 , line = -1 , at = fivenum(angio$PE_ALDER, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6)
axis(side = 1 , at = fivenum(angio$PE_ALDER, na.rm = T))

GAM Plot with 95% Confidence Shade

GAM plot with conficence "shade" and customized rugs
The lightblue shade denoting the 95% pointwise confidence limits of the GAM estimate is a polygon() object in R.

Usually one would plot the GAM model with the default termplot() function and specifiy se=T to get the confidence limits as lines. I showed in a recent post how to plot the fitted GAM smooth and the confidence limits manually.

The same approach can be used to have the confidence limits as a shade. In order to achieve this:

  1. Fit the model
  2. library(mgcv)
    model <- gam( MyResponse ~ MyCovariate , data = MyDataset)

  3. get the estimated values of the fitted smoothing spline
  4. fit <- predict( model , se = TRUE )$fit

  5. and pointwise standard errors
  6. se <- predict( model , se = TRUE)$se.fit

  7. calculate the values of the upper and lower 95%-confidence limits
  8. lcl <- fit - 1.96 * se
    ucl <- fit + 1.96 * se

  9. plot an empty coordinate system with right scaling, labels , titles and so on. I prefer to include axes = FALSE and add the axes manually. See here for an example.
  10. plot( 0 , type = "n" , bty = "n" , xlab = "The x-axis lable" , ylab = "The y-axis lable" , main = "The Title" , xlim = c(0,5) , ylim = c(0,3) )

  11. No it gets a bit tricky with sorting the coordinates in the right order. R provides a very effective way to achieve this, though it is not easily understandable at once. We create two indices on the dataset i.for and i.back which number the dataset according to the independend variable on the x-axis – forward and backward respectively:
  12. i.for <- order( MyDataset$MyCovariate )
    i.back <- order( MyDataset$MyCovariate , decreasing = TRUE )

  13. The points of the shade polygon follow here first the upper confidence line (ucl) forward and then the lower confidence line backward. The coordinates of the shade-polygon are
  14. x.polygon <- c( MyDataset$MyCovariate[i.for] , MyDataset$MyCovariate[i.back] )
    y.polygon <- c( ucl[i.for] , lcl[i.back] )

  15. First plot the polygon with the confidence shade – otherwise the GAM plot vanishes behind it.
  16. polygon( x.polygon , y.polygon , col = "#A6CEE3" , border = NA )

  17. now the mainplot afterwards
  18. lines( MyDataset$MyCovariate[i.for] , fit[i.for], col = "#1F78B4" , lwd = 3 )

  19. add a horizontal line marking the intercept ( the mean of the covariate) – note: univariate only. I will post a remark on this later…
  20. abline( h = mean(MyDataset$MyCovariate) , lty = 2 )

      Hope no typos. Comment if you find some.