# Densityplot Variations

## Variation I Densityplot with filled area, quartiles and mean

Just playing around the other day to get the default plot.density() function a bit more like publishing quality. Above is my favorite so far. Further down are two more spartanic versions.

In this connection I also wrote my first function. When I get more customed to R I will package my ideas into an R library – but later…

There is the possibility to call R-functions without embedding the code into your analysis script, but I did not look that up yet, so embedding the following code into your script (at the beginning) and then using
`densityplot(Dataset\$Coviate)`
will do the job, where Dataset and Covariate have to be replaced by the according values of course.

Now the function code:
```# function densityplot densityplot <- function(x , digits = 1 , xlab = "" , ylab = "" , main = "Density" , col = "blue"){ dens <- density(x , na.rm = T) bins <- as.numeric(cut(dens\$x , breaks = fivenum(x))) i1 <- bins == 1 & !is.na(bins) i2 <- bins == 2 & !is.na(bins) i3 <- bins == 3 & !is.na(bins) i4 <- bins == 4 & !is.na(bins) x.p1 <- dens\$x[i1]; x.p1 <- c(x.p1,max(x.p1),min(x.p1)) x.p2 <- dens\$x[i2]; x.p2 <- c(x.p2,max(x.p2),min(x.p2)) x.p3 <- dens\$x[i3]; x.p3 <- c(x.p3,max(x.p3),min(x.p3)) x.p4 <- dens\$x[i4]; x.p4 <- c(x.p4,max(x.p4),min(x.p4)) y.p1 <- dens\$y[i1]; y.p1 <- c(y.p1,0,0) y.p2 <- dens\$y[i2]; y.p2 <- c(y.p2,0,0) y.p3 <- dens\$y[i3]; y.p3 <- c(y.p3,0,0) y.p4 <- dens\$y[i4]; y.p4 <- c(y.p4,0,0) plot(dens , type = "n" , axes = F, main = main, xlab = xlab , ylab = ylab) polygon(x.p1,y.p1 , border = F , col = col) polygon(x.p2,y.p2 , border = F , col = col) polygon(x.p3,y.p3 , border = F , col = col) polygon(x.p4,y.p4 , border = F , col = col) axis(side = 1 , line = 0 , at = fivenum(x, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6) axis(side = 1 , line = 1 , at = fivenum(x, na.rm = T)) axis(side = 1 , line = 1 , at = round(mean(x , na.rm = T) , digits = digits) , tcl = 0.4 , label = F) axis(side = 1 , line = -1.5 , at = round(mean(x , na.rm = T) , digits = digits) , tick = F , cex.axis = 0.6) axis(side = 1 , line = -2.0 , at = round(mean(x , na.rm = T) , digits = digits) , label = "Mean" , tick = F , cex.axis = 0.6) }```

## Variation II

Some might find the colored area to much, although IMHO it puts the focus on the fact that one is looking at areas when ploting a density. But then something similar without the color fill. Not a function just a few lines of code to embed and adjust to the script:

… and the source…
```plot(density(angio\$PE_ALDER , na.rm = T), axes = F, main = "Basic densityplot", xlab = "" , ylab = "") # Add Quartiles axis(side = 1 , line = 1 , at = fivenum(angio\$PE_ALDER, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6) axis(side = 1 , line = 2 , at = fivenum(angio\$PE_ALDER, na.rm = T)) abline(v = fivenum(angio\$PE_ALDER, na.rm = T)[2:4] , lty = 3) # Mean axis(side = 1 , line = 2 , at = round(mean(angio\$PE_ALDER , na.rm = T) , digits = 2) , tcl = 0.4 , label = F) axis(side = 1 , line = -0.5 , at = round(mean(angio\$PE_ALDER , na.rm = T) , digits = 2) , tick = F) axis(side = 1 , line = -1.4 , at = round(mean(angio\$PE_ALDER , na.rm = T) , digits = 2) , label = "Mean" , tick = F , cex.axis = 0.6) abline(v = mean(angio\$PE_ALDER , na.rm = T) , lty = 4)```

## Variation III

… finally an even more stripped down version without the mean: which was achieved by:
```plot(density(angio\$PE_ALDER , na.rm = T), axes = F, main = "Basic densityplot", xlab = "Age") abline(v = fivenum(angio\$PE_ALDER, na.rm = T)[2:4] , lty = 3) axis(side = 1 , line = -1 , at = fivenum(angio\$PE_ALDER, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6) axis(side = 1 , at = fivenum(angio\$PE_ALDER, na.rm = T))```

# GAM Plot with 95% Confidence Shade The lightblue shade denoting the 95% pointwise confidence limits of the GAM estimate is a polygon() object in R.

Usually one would plot the GAM model with the default termplot() function and specifiy se=T to get the confidence limits as lines. I showed in a recent post how to plot the fitted GAM smooth and the confidence limits manually.

The same approach can be used to have the confidence limits as a shade. In order to achieve this:

1. Fit the model
2. ```library(mgcv) model <- gam( MyResponse ~ MyCovariate , data = MyDataset)```

3. get the estimated values of the fitted smoothing spline
4. `fit <- predict( model , se = TRUE )\$fit`

5. and pointwise standard errors
6. `se <- predict( model , se = TRUE)\$se.fit`

7. calculate the values of the upper and lower 95%-confidence limits
8. ```lcl <- fit - 1.96 * se ucl <- fit + 1.96 * se```

9. plot an empty coordinate system with right scaling, labels , titles and so on. I prefer to include axes = FALSE and add the axes manually. See here for an example.
10. `plot( 0 , type = "n" , bty = "n" , xlab = "The x-axis lable" , ylab = "The y-axis lable" , main = "The Title" , xlim = c(0,5) , ylim = c(0,3) )`

11. No it gets a bit tricky with sorting the coordinates in the right order. R provides a very effective way to achieve this, though it is not easily understandable at once. We create two indices on the dataset i.for and i.back which number the dataset according to the independend variable on the x-axis – forward and backward respectively:
12. ```i.for <- order( MyDataset\$MyCovariate ) i.back <- order( MyDataset\$MyCovariate , decreasing = TRUE )```

13. The points of the shade polygon follow here first the upper confidence line (ucl) forward and then the lower confidence line backward. The coordinates of the shade-polygon are
14. ```x.polygon <- c( MyDataset\$MyCovariate[i.for] , MyDataset\$MyCovariate[i.back] ) y.polygon <- c( ucl[i.for] , lcl[i.back] )```

15. First plot the polygon with the confidence shade – otherwise the GAM plot vanishes behind it.
16. `polygon( x.polygon , y.polygon , col = "#A6CEE3" , border = NA )`

17. now the mainplot afterwards
18. `lines( MyDataset\$MyCovariate[i.for] , fit[i.for], col = "#1F78B4" , lwd = 3 )`

19. add a horizontal line marking the intercept ( the mean of the covariate) – note: univariate only. I will post a remark on this later…
20. `abline( h = mean(MyDataset\$MyCovariate) , lty = 2 )`

Hope no typos. Comment if you find some.

# Cannot find sprng 2.0 header file

Trying to
`install.packages("rsprng")`
fails because
`Cannot find sprng 2.0 header file.`

The development package of the SPRNG (Scalable Parallel Random Number Generator) library which provides several RNGs that are suitable for use in parallel computing are missing:

`sudo aptitude install libsprng2-dev`

fixes it.

# Fancy Rugs in Regression Plots

Rugplots along the axes show the distribution of the underlying data in regression model plots. This is particulary useful in connection with additive (nonparametric) models where the plotted smooth function is the exclusive representation of the model in order to assess how much data contributed to the model fit at the different values of the exlanatory variable.

The custom plot.gam() function includes the possibility of such rugs and pointwise conficence intervalls by default.

Adding quartiles to the rugs requires some customization, though. I included the complete code to produce the above plot underneath.

The example for this GAM is borrowed from the excellent book of Alain Zuur et. al. Mixed Effects Models and Extensions in Ecology with R p.55ff. The ISIT data to run the code above is included in the R package AED which can be downloaded from the books website.

Note: the package AED is needed for the example dataset only. It is NOT necessary to use the example code on ones own dataset.

The main points concerning the rugs and quantile lables on the x-axis are:

1. Plot the coordinate system without lables
2. `plot( ... , axes = FALSE )`

3. Plot the x-axis with x-lables
4. `axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)`

5. Plot rugs – the jitter() is necessary since a lot of datapoints sit on the same values of SampleDepth – so shake them a bit.
6. `axis(side = 1 , line = -0.9 , at = jitter(ISIT\$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)`

7. Print the lables “1Q”, “Median” and “3Q” or whatever you like to call them on the right position. The line and padj parameter set the position of the text and cex.axis the textsize.
8. `axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)`

9. Plot thick tickmarks crossing through the rug cloud at 1Q, median and 3Q
10. `axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)`

11. and finally short thick tickmarks under the text touching the x-axis
12. `axis(side = 1 , line = 0.3, at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)`

Here goes the complete code:
```library(mgcv) library(AED) data(ISIT) # # Fit a univariate GAM model model <- gam(Sources ~ s(SampleDepth) , data = ISIT) fit <- predict(model, se = T)\$fit se <- predict(model, se = T)\$se.fit lcl <- fit - 1.96 * se ucl <- fit + 1.96 * se # # open a jpeg jpeg("FancyRugs.jpg" , width=400, height=400) # # set plotting options: 1 plot per page, horizontal labels and textsize par(mfrow = c(1,1) , las = 1 , cex = 1) # # plot coordinatesystem and labels plot(0 , bty = "n" , type = "n" , xlim = c(0,5000) , ylim = c(-10,50) , xlab = "Depth (m)" , ylab = expression(paste("Number of sources (" , m^-3 , ")")) , axes = FALSE) # title(main="Association between number of sources of\nbioluminescent organisms and ocean depth" , cex.main = 0.8) # ## _____ X-AXIS ______ # x-axis values axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE) # # rugs at datapoints axis(side = 1 , line = -0.9 , at = jitter(ISIT\$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) # # labels at 1Q, median and 3Q axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8) # # tick marks at 1Q, median and 3Q axis(side = 1 , line = 0.3, at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) # axis(side = 1 , line = -0.8 , at = fivenum(ISIT\$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE) # ## _____ Y-AXIS ______ # y-axis values axis(side = 2 , at = 0:5*10) # # rugs at datapoints axis(side = 2 , line = -0.9 , at = jitter(ISIT\$Sources) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0) # # labels at 1Q, median and 3Q axis(side = 2 , line = -0.7 , at = fivenum(ISIT\$Sources)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black") # # thicker tick marks at 1Q, median and 3Q axis(side = 2 , line = 0.3, at = fivenum(ISIT\$Sources)[2:4], lwd = 0 , tick = T, tcl = 0.3 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE , padj = -2) axis(side = 2 , line = -0.7 , at = fivenum(ISIT\$Sources)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black" , labels = FALSE) # # horizontal line marking the intercept = mean(Sources) (for univariate model only) abline(h=mean(ISIT\$Sources), lty=3) # # Scatterplot lines(ISIT\$SampleDepth , ISIT\$Source , type = "p" , cex = 0.4 , lwd = 0.2 , col = "grey") # # plot main figure lines(ISIT\$SampleDepth[order(ISIT\$SampleDepth)] , fit[order(ISIT\$SampleDepth)] , col = "black" , lwd = 2) # # plot lower confidence limit (lcl) lines(ISIT\$SampleDepth[order(ISIT\$SampleDepth)] , lcl[order(ISIT\$SampleDepth)] , col = "grey" , lwd = 1) # # plot upper confidence limit (ucl) lines(ISIT\$SampleDepth[order(ISIT\$SampleDepth)] , ucl[order(ISIT\$SampleDepth)] , col = "grey" , lwd = 1) # # closing the jpg file dev.off()```

# Install R Package XML with Ubuntu amd64

`install.packages("XML")`

failed while complaining

`Cannot find xml2-config`

Seth Falcon pointed out library libxml2-dev – Development files for the GNOME XML library – where missing.

`sudo aptitude install libxml2-dev`

fixed the problem.

# Installing RODBC with Ubuntu/Debian amd64

Trying to install RODBC in Ubuntu with
`install.packages("RODBC")`
failed throwing an error message
`ODBC headers sql.h and sqlext.h not found`

A glance at the r-help showed that it had to do with something called unixODBC – an ODBC driver manager.

The package was installed, but not the development package and thus not the headers which R complained about.

Again a not-so-obvious-for-the-newbee-Linux-Unix-Shell-goblish thing. The fix is
`sudo aptitude install unixodbc-dev`

# Open Access .mdb Files with RODBC

Getting data into R can be done by reading colon separated files (.csv) via the read.table() function. It is also possible to access databases directly and send SQL queries directly from R to the database. This has some advantages: Using Sweave the queries get documented in the analysis report, variable formats are retained.

To install the RODBC package:
`install.packages("RODBC")`

Open a database connection to an Microsoft Access database file, e.g. “MyDataBase.mdb” sitting in the Folder “C:\ MyPath\MyDataBase.mdb”:
`channel <- odbcConnectAccess("C:/MyPath/MyDataBase")`
note that the Windows backslashes “\” become slashes “/” in R and the extension “.mdb” is omitted.

Getting the database table “MyTable” into the R dataframe “R.Table”
`R.Table <- sqlQuery( channel , paste ("select * from MyTable"))`
MyTable can also be a sql query in the Access database.