Correlation matrices are a great way to visualize covariation in large datasets. Thanks to Sarkar (2008) and Friendly (2002), we can borrow these examples. First we need to load a few packages:
library(lattice)
library(ellipse)
We then go on to importing our data:
MyData <- read.table("/FILE/LOCATION/FILE.txt", header=TRUE,sep='\t',quote='')
Then on for some tedious work. We’ll have to manually type in the name (as it named i the dataset!) of each of the variables we’re interested in (unless off course you’ll want to analyze the whole shebang – which often isn’t desirable).
var <- c("var1","var2","var3")
We then proceed to make a correlation matrix with the selected variables vector “var”:
cor.MyData <- cor(MyData[,var],use="pair")
And here’s the money shot; setting the order of the variables in the matrix based on the result of a cluster analysis. This is for easier visual interpretation.
ord <- order.dendrogram(as.dendrogram(hclust(dist(cor.MyData))))
We then can make our Corrgrams. First an easy one:
print(levelplot(cor.MyData[ord,ord],xlab=NULL,ylab=NULL,
at=do.breaks(c(-1.01,1.01),101),scales=list(x=list(rot=90)),
colorkey=list(space="top"),
col.regions=colorRampPalette(c("red","white","blue"))))
Then on to the more elaborate ones, first a informative one using the ellipse package. We’ll have to write our own panel function here (again, thanks to Sarkar 2008), but this is completly generic, so cut ‘n’ paste should suffice.
panel.corrgram<-function(x,y,z,subscripts,at,
level=0.9,label=FALSE,...
{require('ellipse',quietly=TRUE)
x<-as.numeric(x)[subscripts]
y<-as.numeric(y)[subscripts]
z<-as.numeric(z)[subscripts]
zcol<-level.colors(z,at=at,...)
for(i in seq(along=z)){
ell<-ellipse(z[i],level=level,npoints=50,
scale=c(.2,.2),centre=c(x[i],y[i]))
panel.polygon(ell,col=zcol[i],border=zcol[i],...)
}
if (label)
panel.text(x=x,y=y,lab=100*round(z,2),
cex=0.8,col=ifelse(z<0,'white','black'))
}
To create a *.pdf of your output we’ll run following code (remember your working directory!):
pdf('corrgram_ellipse.pdf',10,10)
levelplot(cor.MyData[ord,ord],
at=do.breaks(c(-1.01,1.01),20),xlab=NULL,
ylab=NULL,colorkey=list(space='top'),
scales=list(x=list(rot=90)),
panel=panel.corrgram,label=TRUE)
dev.off()
However, if you’re more in the pacman thing, this next one might be a better choice. We’ll start by writing a new panel function:
panel.corrgram.2<-function(x,y,z,
subscripts,at=pretty(z),scale=0.8,...)
{
require('grid',quietly=TRUE)
x<-as.numeric(x)[subscripts]
y<-as.numeric(y)[subscripts]
z<-as.numeric(z)[subscripts]
zcol<-level.colors(z,at=at,...)
for(i in seq(along=z))
{
lims<-range(0,z[i])
tval<-2*base::pi*
seq(from=lims[1],to=lims[2],by=0.01)
grid.polygon(x=x[i]+.5*scale*c(0,sin(tval)),
y=y[i]+.5*scale*c(0,cos(tval)),
default.units='native',
gp=gpar(fill=zcol[i]))
grid.circle(x=x[i],y=y[i],r=.5*scale,
default.units='native')
}
}
Which we’ll export:
pdf('corrgram_pacman.pdf',10,10)
levelplot(cor.MyData[ord,ord],
at=do.breaks(c(-1.01,1.01),101),xlab=NULL,
ylab=NULL,colorkey=list(space='top'),
scales=list(x=list(rot=90)),
panel=panel.corrgram.2,
col.regions=colorRampPalette(c('red','white','blue')))
dev.off()
Of course, changing the output size (and type) is easy. Changing pdf –> png (or the format of your choosing). Size were in these examples 10 by 10 inches.