# GAM Plot with 95% Confidence Shade The lightblue shade denoting the 95% pointwise confidence limits of the GAM estimate is a polygon() object in R.

Usually one would plot the GAM model with the default termplot() function and specifiy se=T to get the confidence limits as lines. I showed in a recent post how to plot the fitted GAM smooth and the confidence limits manually.

The same approach can be used to have the confidence limits as a shade. In order to achieve this:

1. Fit the model
2. library(mgcv) model <- gam( MyResponse ~ MyCovariate , data = MyDataset)

3. get the estimated values of the fitted smoothing spline
4. fit <- predict( model , se = TRUE )$fit 5. and pointwise standard errors 6. se <- predict( model , se = TRUE)$se.fit

7. calculate the values of the upper and lower 95%-confidence limits
8. lcl <- fit - 1.96 * se ucl <- fit + 1.96 * se

9. plot an empty coordinate system with right scaling, labels , titles and so on. I prefer to include axes = FALSE and add the axes manually. See here for an example.
10. plot( 0 , type = "n" , bty = "n" , xlab = "The x-axis lable" , ylab = "The y-axis lable" , main = "The Title" , xlim = c(0,5) , ylim = c(0,3) )

11. No it gets a bit tricky with sorting the coordinates in the right order. R provides a very effective way to achieve this, though it is not easily understandable at once. We create two indices on the dataset i.for and i.back which number the dataset according to the independend variable on the x-axis – forward and backward respectively:
12. i.for <- order( MyDataset$MyCovariate ) i.back <- order( MyDataset$MyCovariate , decreasing = TRUE )

13. The points of the shade polygon follow here first the upper confidence line (ucl) forward and then the lower confidence line backward. The coordinates of the shade-polygon are
14. x.polygon <- c( MyDataset$MyCovariate[i.for] , MyDataset$MyCovariate[i.back] ) y.polygon <- c( ucl[i.for] , lcl[i.back] )

15. First plot the polygon with the confidence shade – otherwise the GAM plot vanishes behind it.
16. polygon( x.polygon , y.polygon , col = "#A6CEE3" , border = NA )

17. now the mainplot afterwards
18. lines( MyDataset$MyCovariate[i.for] , fit[i.for], col = "#1F78B4" , lwd = 3 ) 19. add a horizontal line marking the intercept ( the mean of the covariate) – note: univariate only. I will post a remark on this later… 20. abline( h = mean(MyDataset$MyCovariate) , lty = 2 )

Hope no typos. Comment if you find some.

# Cannot find sprng 2.0 header file

Trying to
install.packages("rsprng")
fails because
Cannot find sprng 2.0 header file.

The development package of the SPRNG (Scalable Parallel Random Number Generator) library which provides several RNGs that are suitable for use in parallel computing are missing:

sudo aptitude install libsprng2-dev

fixes it.

# Fancy Rugs in Regression Plots

Rugplots along the axes show the distribution of the underlying data in regression model plots. This is particulary useful in connection with additive (nonparametric) models where the plotted smooth function is the exclusive representation of the model in order to assess how much data contributed to the model fit at the different values of the exlanatory variable.

The custom plot.gam() function includes the possibility of such rugs and pointwise conficence intervalls by default.

Adding quartiles to the rugs requires some customization, though. I included the complete code to produce the above plot underneath.

The example for this GAM is borrowed from the excellent book of Alain Zuur et. al. Mixed Effects Models and Extensions in Ecology with R p.55ff. The ISIT data to run the code above is included in the R package AED which can be downloaded from the books website.

Note: the package AED is needed for the example dataset only. It is NOT necessary to use the example code on ones own dataset.

The main points concerning the rugs and quantile lables on the x-axis are:

1. Plot the coordinate system without lables
2. plot( ... , axes = FALSE )

3. Plot the x-axis with x-lables
4. axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)

5. Plot rugs – the jitter() is necessary since a lot of datapoints sit on the same values of SampleDepth – so shake them a bit.
6. axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) 7. Print the lables “1Q”, “Median” and “3Q” or whatever you like to call them on the right position. The line and padj parameter set the position of the text and cex.axis the textsize. 8. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)

9. Plot thick tickmarks crossing through the rug cloud at 1Q, median and 3Q
10. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) 11. and finally short thick tickmarks under the text touching the x-axis 12. axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

Here goes the complete code:
library(mgcv) library(AED) data(ISIT) # # Fit a univariate GAM model model <- gam(Sources ~ s(SampleDepth) , data = ISIT) fit <- predict(model, se = T)$fit se <- predict(model, se = T)$se.fit lcl <- fit - 1.96 * se ucl <- fit + 1.96 * se # # open a jpeg jpeg("FancyRugs.jpg" , width=400, height=400) # # set plotting options: 1 plot per page, horizontal labels and textsize par(mfrow = c(1,1) , las = 1 , cex = 1) # # plot coordinatesystem and labels plot(0 , bty = "n" , type = "n" , xlim = c(0,5000) , ylim = c(-10,50) , xlab = "Depth (m)" , ylab = expression(paste("Number of sources (" , m^-3 , ")")) , axes = FALSE) # title(main="Association between number of sources of\nbioluminescent organisms and ocean depth" , cex.main = 0.8) # ## _____ X-AXIS ______ # x-axis values axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE) # # rugs at datapoints axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) # # labels at 1Q, median and 3Q axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8) # # tick marks at 1Q, median and 3Q axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) # axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) # ## _____ Y-AXIS ______ # y-axis values axis(side = 2 , at = 0:5*10) # # rugs at datapoints axis(side = 2 , line = -0.9 , at = jitter(ISIT$Sources) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) # # labels at 1Q, median and 3Q axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black") # # thicker tick marks at 1Q, median and 3Q axis(side = 2 , line = 0.3, at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 0.3 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE , padj = -2) axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black" , labels = FALSE) # # horizontal line marking the intercept = mean(Sources) (for univariate model only) abline(h=mean(ISIT$Sources), lty=3) # # Scatterplot lines(ISIT$SampleDepth , ISIT$Source , type = "p" , cex = 0.4 , lwd = 0.2 , col = "grey") # # plot main figure lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , fit[order(ISIT$SampleDepth)] , col = "black" , lwd = 2) # # plot lower confidence limit (lcl) lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , lcl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1) # # plot upper confidence limit (ucl) lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , ucl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1) # # closing the jpg file dev.off()

# Install R Package XML with Ubuntu amd64

install.packages("XML")

failed while complaining

Cannot find xml2-config

Seth Falcon pointed out library libxml2-dev – Development files for the GNOME XML library – where missing.

sudo aptitude install libxml2-dev

fixed the problem.

# Installing RODBC with Ubuntu/Debian amd64

Trying to install RODBC in Ubuntu with
install.packages("RODBC")
failed throwing an error message
ODBC headers sql.h and sqlext.h not found

A glance at the r-help showed that it had to do with something called unixODBC – an ODBC driver manager.

The package was installed, but not the development package and thus not the headers which R complained about.

Again a not-so-obvious-for-the-newbee-Linux-Unix-Shell-goblish thing. The fix is
sudo aptitude install unixodbc-dev

# Open Access .mdb Files with RODBC

Getting data into R can be done by reading colon separated files (.csv) via the read.table() function. It is also possible to access databases directly and send SQL queries directly from R to the database. This has some advantages: Using Sweave the queries get documented in the analysis report, variable formats are retained.

To install the RODBC package:
install.packages("RODBC")

Open a database connection to an Microsoft Access database file, e.g. “MyDataBase.mdb” sitting in the Folder “C:\ MyPath\MyDataBase.mdb”:
channel <- odbcConnectAccess("C:/MyPath/MyDataBase")
note that the Windows backslashes “\” become slashes “/” in R and the extension “.mdb” is omitted.

Getting the database table “MyTable” into the R dataframe “R.Table”
R.Table <- sqlQuery( channel , paste ("select * from MyTable"))
MyTable can also be a sql query in the Access database.

# Correlation matrices

Correlation matrices are a great way to visualize covariation in large datasets. Thanks to Sarkar (2008) and Friendly (2002), we can borrow these examples. First we need to load a few packages:

library(lattice) library(ellipse)

We then go on to importing our data:

MyData <- read.table("/FILE/LOCATION/FILE.txt", header=TRUE,sep='\t',quote='')

Then on for some tedious work. We’ll have to manually type in the name (as it named i the dataset!) of each of the variables we’re interested in (unless off course you’ll want to analyze the whole shebang – which often isn’t desirable).

var <- c("var1","var2","var3")

We then proceed to make a correlation matrix with the selected variables vector “var”:

cor.MyData <- cor(MyData[,var],use="pair")

And here’s the money shot; setting the order of the variables in the matrix based on the result of a cluster analysis. This is for easier visual interpretation.

ord <- order.dendrogram(as.dendrogram(hclust(dist(cor.MyData))))

We then can make our Corrgrams. First an easy one:

print(levelplot(cor.MyData[ord,ord],xlab=NULL,ylab=NULL, at=do.breaks(c(-1.01,1.01),101),scales=list(x=list(rot=90)), colorkey=list(space="top"), col.regions=colorRampPalette(c("red","white","blue")))) Then on to the more elaborate ones, first a informative one using the ellipse package. We’ll have to write our own panel function here (again, thanks to Sarkar 2008), but this is completly generic, so cut ‘n’ paste should suffice.

panel.corrgram<-function(x,y,z,subscripts,at, level=0.9,label=FALSE,... {require('ellipse',quietly=TRUE) x<-as.numeric(x)[subscripts] y<-as.numeric(y)[subscripts] z<-as.numeric(z)[subscripts] zcol<-level.colors(z,at=at,...) for(i in seq(along=z)){ ell<-ellipse(z[i],level=level,npoints=50, scale=c(.2,.2),centre=c(x[i],y[i])) panel.polygon(ell,col=zcol[i],border=zcol[i],...) } if (label) panel.text(x=x,y=y,lab=100*round(z,2), cex=0.8,col=ifelse(z<0,'white','black')) }

To create a *.pdf of your output we’ll run following code (remember your working directory!):

pdf('corrgram_ellipse.pdf',10,10) levelplot(cor.MyData[ord,ord], at=do.breaks(c(-1.01,1.01),20),xlab=NULL, ylab=NULL,colorkey=list(space='top'), scales=list(x=list(rot=90)), panel=panel.corrgram,label=TRUE) dev.off() However, if you’re more in the pacman thing, this next one might be a better choice. We’ll start by writing a new panel function:

panel.corrgram.2<-function(x,y,z, subscripts,at=pretty(z),scale=0.8,...) { require('grid',quietly=TRUE) x<-as.numeric(x)[subscripts] y<-as.numeric(y)[subscripts] z<-as.numeric(z)[subscripts] zcol<-level.colors(z,at=at,...) for(i in seq(along=z)) { lims<-range(0,z[i]) tval<-2*base::pi* seq(from=lims,to=lims,by=0.01) grid.polygon(x=x[i]+.5*scale*c(0,sin(tval)), y=y[i]+.5*scale*c(0,cos(tval)), default.units='native', gp=gpar(fill=zcol[i])) grid.circle(x=x[i],y=y[i],r=.5*scale, default.units='native') } }

Which we’ll export:

pdf('corrgram_pacman.pdf',10,10) levelplot(cor.MyData[ord,ord], at=do.breaks(c(-1.01,1.01),101),xlab=NULL, ylab=NULL,colorkey=list(space='top'), scales=list(x=list(rot=90)), panel=panel.corrgram.2, col.regions=colorRampPalette(c('red','white','blue'))) dev.off() Of course, changing the output size (and type) is easy. Changing pdf –> png (or the format of your choosing). Size were in these examples 10 by 10 inches.

# Installing and Updating R

These are the recommendations from CRAN on how to install and update R. I have streamlined it for my situtation.

To obtain the latest R packages, add an entry like
deb http://<my.favorite.cran.mirror>/bin/linux/ubuntu hardy/

In my case (using Ubuntu Hardy 8.04 LTS in Bergen, Norway) this is
deb http://cran.ii.uib.no/bin/linux/ubuntu hardy/

See http://cran.r-project.org/mirrors.html
for the list of CRAN mirrors.

To install the complete R system, use
sudo aptitude update sudo aptitude install r-recommended ess

ESS is a package to use R via the emacs editor. It is good! As an added convenience to Ubuntu users it is provided as an up-to-date version in the cran repositories.

SECURE APT

The Ubuntu archives on CRAN are signed with the key of “Vincent Goulet
<vincent.goulet@act.ulaval.ca>” with key ID E2A11821. You can fetch
this key with

gpg --keyserver subkeys.pgp.net --recv-key E2A11821

and then feed it to apt-key with
gpg -a --export E2A11821 | sudo apt-key add -

ADMINISTRATION AND MAINTENANCE OF R PACKAGES

The R packages part of the Ubuntu r-base and r-recommended packages
are installed into the directory /usr/lib/R/library. These can be
updated using aptitude with

sudo aptitude update sudo aptitude upgrade

The other r-cran-* packages shipped with Ubuntu are installed into the
directory /usr/lib/R/site-library.

Installing R packages not provided with Ubuntu first requires tools to
compile the packages from source. These tools are installed via the R
development package with
sudo aptitude install r-base-dev

Then a site administrator can install R packages into the directory
/usr/local/lib/R/site-library by running R as root and using the install.packages() function:
sudo R install.packages()

A routine update can then be undertaken from R using
update.packages(lib.loc = "/usr/local/lib/R/site-library")

# Create LaTex tables from R output

Creating reasonable layouted LaTeX tables from R output was easier then expected. I should have googled it long ago…

install.packages("xtable")

Lets say you created a tabular output in R called “tab1”, e.g by doing:
data(CO2) tab1 <- with(CO2, table(Treatment , Type)) tab1

your text output in R would look like

 Type Treatment Quebec Mississippi nonchilled 21 21 chilled 21 21

Now you would like this or whatever table or data.frame as a nice LaTeX-table, the only thing to do is:
library(xtable) xtable(tab1)

and the output will be:

% latex table generated in R 2.9.0 by xtable 1.5-5 package % Wed Jul 08 16:20:54 2009 \begin{table}[ht] \begin{center} \begin{tabular}{rrr} \hline & Quebec & Mississippi \\ \hline nonchilled & 21 & 21 \\ chilled & 21 & 21 \\ \hline \end{tabular} \end{center} \end{table}

If you are using Sweave the usage becomes
<< echo = FALSE , results = tex >> library(xtable) data(CO2) with(CO2, xtable(table(Treatment , Type))) @
and a the result of the R run is a LaTeX document. Another post will give a hint how to paste R graphics into the same Sweave or LaTeX document… later…

The full usage of xtable() is
xtable(x, caption=NULL, label=NULL, align=NULL, digits=NULL, display=NULL)
xtable(table , caption = "My table caption")
and labels
xtable(table , label = " MyLaTeXlable")
to the LaTeX tables.

# Install R packages without internet connection

You might have no internet connection or as in my case you have one, but the firewall prevents anything beside Windows IE to access the net. Grrr.

Ok. It takes more time, but is possible. Lets have an example:
Installing the package ‘gam’ for R-2.9 on Windows being in Norway (with cran.ii.uib.no as the nearest and fastest “Comprehensive R Archive Network”.

1. Find another place with internet connection or as in my case, the browser works, so use the browser.
2. Find the package on the repository corresponding to your operating system and your version of R in our above example here
3. klick on the link gam_1.0.zip, download it and save it (e.g. using an USB disk). Remember the path to the file (in our example in the folder H:/DATA/).
4. Open R and write install.packages("H:/DATA/gam_1.0.zip", repos=NULL) at the command line.

Thats it.