Remove Color Channels from TIFF Using Command Line

The R function

tiff()

is used a lot to export R graphics e.g. for publication purposes. The tiff includes all 8 color channels and this collides with formatation requirements of most jounals (1200dpi but not more than some MB in size).

tiff("Whiskerplot.tiff" , width = 8.8 , height = 8.8 , units = "cm" , pointsize = 8 , res = 1200)
produces the required 1200 dpi resolution for monochrome plots, but with all 8 color channels (although just one is necessary) this results in 49.4MB for a tiny 8.8×8.8 cm (ca 3.5×3.5 inch) plot. I used GIMP to remove the color channels with

Image > Mode > Indexed > 1-bit monochrome

but this required opening GiMP, the file, the menu and saving. And again, when the plot had to be modified.

Since this has to be done for each and every publication figure I looked up a command line solution and here it is:
convert Whiskerplot.tiff -flatten -monochrome Whiskerplot_monochrome.tiff
removes all unnecessary colorchannels from Whiskerplot.tiff and saves it as Whiskerplot_monochrome.tiff, reducing the size from 49.4MB to 2.1MB while the result is exactly the same.

Hattip: Paddy Landau on ubuntuforums

Command line method to copy all files out of a folder tree

This is not so much statistics, but I do want to keep this in case I need it again.

For years I am collecting pictures and photos in a host of folders. Most pictures are in Pictures with a lot of subfolders and a lot of duplicates. Just today I imported another chunk of pictures with gThumb which has the very nice feature of creating a Year/Month/Day/ folder tree and moves the files there. But: I chose the wrong parent folder, so that I have a lot of /Year/Month/Day/ folders in more than one place. Bummer. And: this might/will happen again.

What I wanted to do is get all picture files out of all folders and sort them in one folder by name. Then have a look at the list (takes time), remove duplicates and move them back into /home/rforge/Pictures/year/month folders (for this I use gThumb, to lazy to research the find way in command line — if somebody reads this he/she might comment on that).

Anyway:
find /path/to/Pictures -name="*.jpg" -exec cp '{}' '/path/to/temporary_jpg/' ';'

finds all .jpg files in the /path/to/Pictures folder and copies (cp) them to a folder temporary_jpg in the home folder (which has to exist before running find.

Since not all files are jpg (case sensitive), like e.g. jpeg, JPEG, BMG, tiff and so on (even .mpg videos are burried in this dreadful old “Pictures” folder/dump) one can omit the -name specification and just dump everything in a flat folder sort by extension/file type and clean up:

find /path/to/Pictures -exec cp '{}' '/path/to/temporaryPics' ';'

Helpful links, which I used:

Coding Tiger
(with comments)
and for renaming all the filename salad from dozens of cameras to a consistend format, that is Year-Month-day-hour-minute-second.jpg (which I do not cover here): ak4good on ubuntu Forum (comment #6).

Windows Server 2008 Remote Desktop access from Ubuntu client

Tried for weeks and weeks to connect to a Windows 2008 Server via remote desktop from my Xubuntu-12.04 laptop. To no avail. There were some steps required like setting up VPN with fixed IP, finding a remote desktop client and trying to connect to the server, so a lot of potential problem sources.

To cut straight to the point:
sudo aptitude install remmina
solved the problem.

Fast-IP-VPN was correctly configured, but the Ubuntu default remote desktop client rdesktop was refused by the 2008 server. What made the problem not so obvious was that the connection to another remote desktop at the same institution worked without problem, but that was a XP server. No one was aware of this and event the IT-support suggested rdesktop.

On top remmina comes with a graphical user interface and in Xubuntu an item in the system tray makes connection to a once correctly configured remote desktop a one-click affair.

Hattip to Jonathan Moeller and his The Ubuntu Beginner’s Guide.

A short howto remmina on ubuntugeek.com

Slow MySQL 5.5.22 database engine

Slooooooow UPDATE query with MySQL 5.5

I am using a local MySQL server a lot to handle, prepare and restructure big research tables. Ubuntu Precise uses MySQL server 5.5 while the previous distros used 5.1. I thought that might be good until I tried to import a table with some dozen variables and some thousand rows with and UPDATE statement which took some seconds (10 min!!).

nick rulez on forum.mysql.com quantified this fact and revealed that the default database engine changed from “MyISAM” to “InnoDB” and that indeed InnoDB is considerably slower in this regard.

So I want MyISAM back.

To list the available and default engines:

show engines
which produces

mysql> show engines;
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| MyISAM | YES | MyISAM storage engine | NO | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
| PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO |
| ARCHIVE | YES | Archive storage engine | NO | NO | NO |
| InnoDB | DEFAULT | Supports transactions, row-level locking, and foreign keys | YES | YES | YES |
| FEDERATED | NO | Federated MySQL storage engine | NULL | NULL | NULL |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.00 sec)

Setting the database engine

When you create a new table, you can specify which storage engine to use by adding an ENGINE table option to the CREATE TABLE statement:

CREATE TABLE t (i INT) ENGINE = INNODB;

If you omit the ENGINE option, the default storage engine is used. Normally, this is MyISAM, but you can change it by using the –default-storage-engine server startup option, or by setting the default-storage-engine option in the my.cnf configuration file.

You can set the default storage engine to be used during the current session by setting the storage_engine variable:

SET storage_engine=MYISAM;

[…]

To convert a table from one storage engine to another, use an ALTER TABLE statement that indicates the new engine:

ALTER TABLE t ENGINE = MYISAM;

I want MyISAM all the time so I decided for the my.cnf option. But where is my.cnf? According to debianadmin:

sudo nano /etc/mysql/my.cnf

Now append
default-storage-engine = MyISAM
Safe and exit with Ctrl-o, Ctrl-x and restart the server.

sudo restart mysql

and MyISAM it is.

Data Backup in the AWS Cloud with rsync

After admitting that of all things Microsoft offers 25GB cloud storage for its Windows Live subscribers I will walk through my latest preliminary experiments regarding backup of important data using the using the Amazon Advanced Web Services. The storage is not free but quite cheap at around 0.1$ per GB and month.

If you use Windows and MS Office a lot use Skydrive and don’t read on 😉 There are posts which describe how to map the Skydrive like a local harddisk using MS Word.

On the long run I would like to mount a EBS storage like a local file tree, probably using WebDAV, but this is my first successful preliminary solution. s3cmd does not work for me.

Using Ubuntu/Linux rsync is a well established, reliable and easy to use tool to keep data between locations in sync. The following post marries rsync with an Elastic Cloud (EC2) server instance for an hour or some. One has to set up the so called rsync daemon and attach a persistent Elastic Block Storage.

This is another post. I will link to it later. There will also be a small script. There are some holes in this tutorial, only the direct configuration of the rsync daemon (including the script) is complete and working. I filled in some hints how to get to this stage. But will write follow ups on that.

System Out provided a nice tutorial of how to set up the rsync in demon mode on a server which listens for clients to sync their data.

Here is my version of it, with a short script at the end which should do the job.

Prerequisites

Of course you need to have rsync on both machines (the server and the client); since both are Ubuntu this is the case.

I will write another post on how to start the server. It is completely possible and quite intuitive to achieve it in the Amazon web interface. When the server is running and an extra EBS harddisk is attached you have to connect to the server using ssh
ssh -i PATH/TO/YOUR/PEM-KEY-FILE ubuntu@ec2-xxx-xx-xxx-xxx.compute-1.amazonaws.com

Mount the persistent drive

There are some posts about the advantages of the xfs filesystem, so I sticked to it. Alestic recommends it for all persistent EC2 cloud disks and I trust they know what they are doing. But xfs is not per default included in the Ubuntu micro instance I use for my backups. That said, in the SSH shell:

sudo apt-get install -y xfsprogs
sudo modprobe xfs

If the backup volume is newly created then format it:
sudo mkfs.xfs /dev/xvdb
Note: Only the first time. Otherwise you wipe your data, of course. Note also the device name. I attached it as /dev/sdb. Though it showed up in the Ubuntu Oneiric i386 t1.micro instance as /dev/xvdb.

Now mount the volume
echo "/dev/xvdb /media/backup xfs noatime 0 0" | sudo tee -a /etc/fstab
sudo mkdir /media/backup
sudo mount /media/backup
sudo chown ubuntu:ubuntu /media/backup
sudo chmod 777 /media/backup

Configuration files

On the server machine you need to set up a daemon to run in the background and host the rsync services.

Before you start the daemon you need to create some rsync daemon configuration files in the /etc directory.

Three files are necessary:

  1. /etc/rsyncd.conf, the actual configuration file,
  2. /etc/rsyncd.motd, Message Of The Day file (the contents of this file will be displayed by the server when a client machine connects) and
  3. /etc/rsyncd.scrt, the username and password pairs.

To create the files on the server:
sudo nano /etc/rsyncd.conf

Now enter the following information into the rsyncd.conf file:

motd file = /etc/rsyncd.motd
[backup]
path = /media/backup
comment = the path to the backup directory on the server
uid = ubuntu
gid = ubuntu
read only = false
auth users = ubuntu
secrets file = /etc/rsyncd.scrt

Hit Ctrl-o to save and Ctrl-x to close nano.

The uid, gid, auth users are the users on the server. In the ssh session on the ec2 instance the user is ubuntu.

The format for the /etc/rsync.scrt file is
username:whatever_password_you_want

Use nano to put some arbitrary text into the /etc/rsync.motd.

Now you should have all the configuration information necessary, all that’s left to do is open the rsync port and start the daemon.

To open the port, open the /etc/default/rsync file, i.e.,

sudo nano /etc/default/rsync

and set RSYNC_ENABLE=true.

Here you might also specify another port than the default 873. Remember to open the port in the security group. Either with the AWS web interface in your browser or in the shell using the ec2-api-tools:
ec2-authorize default -p 873

Now to start the daemon,
sudo /etc/init.d/rsync restart
and exit the SSH session.

Syncing a folder

Now you can use your local shell to push some folders or files to the server. Update the server side from the client machine with ec2-api-tools installed:
EXIP=`ec2din | grep INSTANCE | grep -v terminated |awk '{print $4}'`
rsync -auv /home/rforge/articles ubuntu@$EXIP::backup/

$EXIP would be the server ip address

This gets the IP of the server from the ec2-api-tool and passes it to RSYNC.

Otherwise you have to remember the IP of your instance from the web interface and substitut it for xxx.xxx.xxx.xxx:
rsync -auv /PATH/TO/FOLDER/ ubuntu@$xxx.xxx.xxx.xxx::backup/

::backup has to match [backup] in the /etc/rsyncd.conf file. You will see the rsyncd.motd message and get prompted for the password in the rsyncd.scrt file. Then rsync starts the upload.

A Script

The following script should do the daemon setup after connecting to the server via ssh and mounting the volume. Keep me posted if something does not work.

echo "motd file = /etc/rsyncd.motd
[backup]
path = /media/backup
comment = the path to the backup directory on the server
uid = ubuntu
gid = ubuntu
read only = false
auth users = ubuntu
secrets file = /etc/rsyncd.scrt" > rsyncd.conf
sudo mv rsyncd.conf /etc/
#
sudo echo "Greetings! Give me the right password! Me want's it!" > rsyncd.motd
sudo mv rsyncd.motd /etc/
#
sudo echo "ubuntu:YourSecretPassword" > rsyncd.scrt
sudo mv rsyncd.scrt /etc/
#
sudo chmod 640 /etc/rsyncd.*
sudo chown root:root /etc/rsyncd.*
#
## enable demon mode in the /etc/default/rsync file
sudo cat /etc/default/rsync | sed 's/RSYNC_ENABLE=false/RSYNC_ENABLE=true/g' > rsync
sudo mv rsync /etc/default/
sudo chown root:root /etc/default/rsync
sudo chmod 644 /etc/default/rsync
#
sudo /etc/init.d/rsync restart # start the demon

R function to transform continuous variable to categorical factor cut at n-tiles

The cut() function can be used to transform a continuous variable into a categorical factor variable. The syntax is quite lengthy and if one wishes to cut at quartiles, quintiles or other n-tiles one has to include the quantile() function into the call.

This is not very newbee friendly and if included into a model-call nearly unreadable.

The function in the code box cutN() does the job.

 cutN <- function(X , n = 4){
     cut(
         X ,
         include.lowest = TRUE ,
         breaks = quantile(
                           X , 
                           probs = (0:n)/n ,
                           na.rm = TRUE ))}

In order to cut the continuous variable Creatinine in the dataset Patients into deciles (n=10) the syntax is:
cutN( Patients$Creatinine , n = 10 )

No big deal, but maybe useful…

Find BIOS version in Ubuntu

The dmidecode command line utility dumps a list of SMBIOS specifications to the standard output. In order to get the version number of the currently installed BIOS open a shell and
sudo dmidecode --type 0 | grep Revision

The –type 0 option restricts the output to BIOS specific information and grep fishes for the revision number.

On my X61s Thinkpad the resulting output is
BIOS Revision: 2.19
Firmware Revision: 1.3

Add public key behind a firewall in Ubuntu Shell

In short: Use
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80/ --recv-key E084DAB9
instead of
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key E084DAB9
This way you force port 80 which is usually clear.

I got the idea from the answer of Phil Bradley on the superuser.com forum. He claimed that this would be fixed in Natty, but it isn’t although the configuration file he mentions has the port80 specification added by default, apt-key does not use it. The above snippet solves that.

For those Ubuntu users who have no idea what I am talking about:

Installing the newest R-version in Ubuntu requires to append the CRAN repository to you /etc/apt/sources.list. One might hit Alt+F2 and enter
gksu gedit /etc/apt/sources.list

With Xubuntu you would use mousepad instead of gedit. In any distro you can use
sudo nano /etc/apt/sources.list
in a terminal.

Usually I add the line
deb http://cran.uib.no/bin/linux/ubuntu natty/
at the end of the file and update with
sudo apt-get update.

CRAN at University of Bergen is closest to me. You might want another one (check the r-project.org site for mirrors).

apt-get update answers with a warning
GPG error: http://cran.uib.no nat/ Release: The folowing signatures coldn't be verified because the public key is not abailable

That is not a problem. One can install R and packages anyway, but it is better to have the public key.

Behind a firewall (and many public and open hotspots block several ports) it is not possible to use

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key E084DAB9

since the port through which the keyserver is contacted is blocked on most firewalls. You have to force port 80 by:
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80/ --recv-key E084DAB9

After the key is added
sudo apt-get update
sudo apt-get install R-recommended emacs ess

proceeds without warning nor error.