Data Backup in the AWS Cloud with rsync

After admitting that of all things Microsoft offers 25GB cloud storage for its Windows Live subscribers I will walk through my latest preliminary experiments regarding backup of important data using the using the Amazon Advanced Web Services. The storage is not free but quite cheap at around 0.1$ per GB and month.

If you use Windows and MS Office a lot use Skydrive and don’t read on 😉 There are posts which describe how to map the Skydrive like a local harddisk using MS Word.

On the long run I would like to mount a EBS storage like a local file tree, probably using WebDAV, but this is my first successful preliminary solution. s3cmd does not work for me.

Using Ubuntu/Linux rsync is a well established, reliable and easy to use tool to keep data between locations in sync. The following post marries rsync with an Elastic Cloud (EC2) server instance for an hour or some. One has to set up the so called rsync daemon and attach a persistent Elastic Block Storage.

This is another post. I will link to it later. There will also be a small script. There are some holes in this tutorial, only the direct configuration of the rsync daemon (including the script) is complete and working. I filled in some hints how to get to this stage. But will write follow ups on that.

System Out provided a nice tutorial of how to set up the rsync in demon mode on a server which listens for clients to sync their data.

Here is my version of it, with a short script at the end which should do the job.


Of course you need to have rsync on both machines (the server and the client); since both are Ubuntu this is the case.

I will write another post on how to start the server. It is completely possible and quite intuitive to achieve it in the Amazon web interface. When the server is running and an extra EBS harddisk is attached you have to connect to the server using ssh

Mount the persistent drive

There are some posts about the advantages of the xfs filesystem, so I sticked to it. Alestic recommends it for all persistent EC2 cloud disks and I trust they know what they are doing. But xfs is not per default included in the Ubuntu micro instance I use for my backups. That said, in the SSH shell:

sudo apt-get install -y xfsprogs
sudo modprobe xfs

If the backup volume is newly created then format it:
sudo mkfs.xfs /dev/xvdb
Note: Only the first time. Otherwise you wipe your data, of course. Note also the device name. I attached it as /dev/sdb. Though it showed up in the Ubuntu Oneiric i386 t1.micro instance as /dev/xvdb.

Now mount the volume
echo "/dev/xvdb /media/backup xfs noatime 0 0" | sudo tee -a /etc/fstab
sudo mkdir /media/backup
sudo mount /media/backup
sudo chown ubuntu:ubuntu /media/backup
sudo chmod 777 /media/backup

Configuration files

On the server machine you need to set up a daemon to run in the background and host the rsync services.

Before you start the daemon you need to create some rsync daemon configuration files in the /etc directory.

Three files are necessary:

  1. /etc/rsyncd.conf, the actual configuration file,
  2. /etc/rsyncd.motd, Message Of The Day file (the contents of this file will be displayed by the server when a client machine connects) and
  3. /etc/rsyncd.scrt, the username and password pairs.

To create the files on the server:
sudo nano /etc/rsyncd.conf

Now enter the following information into the rsyncd.conf file:

motd file = /etc/rsyncd.motd
path = /media/backup
comment = the path to the backup directory on the server
uid = ubuntu
gid = ubuntu
read only = false
auth users = ubuntu
secrets file = /etc/rsyncd.scrt

Hit Ctrl-o to save and Ctrl-x to close nano.

The uid, gid, auth users are the users on the server. In the ssh session on the ec2 instance the user is ubuntu.

The format for the /etc/rsync.scrt file is

Use nano to put some arbitrary text into the /etc/rsync.motd.

Now you should have all the configuration information necessary, all that’s left to do is open the rsync port and start the daemon.

To open the port, open the /etc/default/rsync file, i.e.,

sudo nano /etc/default/rsync

and set RSYNC_ENABLE=true.

Here you might also specify another port than the default 873. Remember to open the port in the security group. Either with the AWS web interface in your browser or in the shell using the ec2-api-tools:
ec2-authorize default -p 873

Now to start the daemon,
sudo /etc/init.d/rsync restart
and exit the SSH session.

Syncing a folder

Now you can use your local shell to push some folders or files to the server. Update the server side from the client machine with ec2-api-tools installed:
EXIP=`ec2din | grep INSTANCE | grep -v terminated |awk '{print $4}'`
rsync -auv /home/rforge/articles ubuntu@$EXIP::backup/

$EXIP would be the server ip address

This gets the IP of the server from the ec2-api-tool and passes it to RSYNC.

Otherwise you have to remember the IP of your instance from the web interface and substitut it for
rsync -auv /PATH/TO/FOLDER/ ubuntu@$

::backup has to match [backup] in the /etc/rsyncd.conf file. You will see the rsyncd.motd message and get prompted for the password in the rsyncd.scrt file. Then rsync starts the upload.

A Script

The following script should do the daemon setup after connecting to the server via ssh and mounting the volume. Keep me posted if something does not work.

echo "motd file = /etc/rsyncd.motd
path = /media/backup
comment = the path to the backup directory on the server
uid = ubuntu
gid = ubuntu
read only = false
auth users = ubuntu
secrets file = /etc/rsyncd.scrt" > rsyncd.conf
sudo mv rsyncd.conf /etc/
sudo echo "Greetings! Give me the right password! Me want's it!" > rsyncd.motd
sudo mv rsyncd.motd /etc/
sudo echo "ubuntu:YourSecretPassword" > rsyncd.scrt
sudo mv rsyncd.scrt /etc/
sudo chmod 640 /etc/rsyncd.*
sudo chown root:root /etc/rsyncd.*
## enable demon mode in the /etc/default/rsync file
sudo cat /etc/default/rsync | sed 's/RSYNC_ENABLE=false/RSYNC_ENABLE=true/g' > rsync
sudo mv rsync /etc/default/
sudo chown root:root /etc/default/rsync
sudo chmod 644 /etc/default/rsync
sudo /etc/init.d/rsync restart # start the demon


Compressed backup of MySQL database

Wrote several posts on this topic, but none was 100% right. The following is a blockquote from and looks much better researched then my previous tries:

Back up your MySQL Database with Compress

If your mysql database is very big, you might want to compress the output of mysqldump. Just use the mysql backup command below and pipe the output to gzip, then you will get the output as gzip file.

$ mysqldump -u [uname] -p[pass] [dbname] | gzip -9 > [backupfile.sql.gz]
If you want to extract the .gz file, use the command below:

$ gunzip [backupfile.sql.gz]

Restoring your MySQL Database

Above we backup the Tutorials database into tut_backup.sql file. To re-create the Tutorials database you should follow two steps:

Create an appropriately named database on the target machine
Load the file using the mysql command:
$ mysql -u [uname] -p[pass] [db_to_restore] < [backupfile.sql]
Have a look how you can restore your tut_backup.sql file to the Tutorials database.

$ mysql -u root -p Tutorials < tut_backup.sql
To restore compressed backup files you can do the following:

gunzip < [backupfile.sql.gz] | mysql -u [uname] -p[pass] [dbname]

MySQL backup

Just for the record: How to combine mysqldump and zip to archive all MySQL databases on the host. I am using a simple MySQL database server on localhost, to organise research tables before analysis.

mysqldump --all-databases | zip -9 -

mysqldump --all-databases writes the content of all databases into the pipe and
zip -9 Filename - compresses the standard input (note the dash ‘-‘ at the end!) to ‘Filename’ (-9 gives maximum compression).

The reverse following the man page of ‘mysqldump’:

You can load the dump file back into the server like this:

shell> mysql db_name < backup-file.sql

Or like this:

shell> mysql -e "source /path-to-backup/backup-file.sql" db_name

Remove U3 System from SanDisk

Bought a SanDisk Cruzer 16GB and found some smart software preinstalled which did not consider smart at all. Everytime I inserted the drive on any computer a CD drive with label U3 System“was mounted containing some funny .exe files. The whole “CD drive” took several MB of diskspace.

I wanted to get rid of it. Fortunately, I was not the first one beeing disturbed.

Sourceforge has a u3-tool which did the job:

  1. Download the tool to a place where you remember it
  2. Unpack the .tar.gz archive (I just rightclicked it and chose “extract here”). This creates a folder like /MyPathTo/u3-tool-0.3/
  3. open a terminal and type: cd /MyPathTo/u3-tool-0.3/
    sudo make install

    Now u3-tool is installed and can be used.
  4. To remove the CD-like partition containing the firmware crap you need the device name of the USB disk: sudo fdisk -l gives the answer. In my case it is /dev/sdb1. Make shure you remember the right one.
  5. Remove the U3 partition with u3-tool -p 0 /dev/sdb1where /dev/sdb1 is the device name remembered from the previous step and the option -p is followed by a zero.


Managing Amazon S3 Online Storage with S3sync

After trying to use Amazon’s S3 web service to backup files and to get a reliable download area for R functions and stuff which is not allowed to be uploaded to I ended up with some experimental “buckets” (= S3 online directory) and some 100 MB of files in them.

It turned out that it is not possible to delete a non-empty bucket from S3, so one is to required to recurse into the directories and delete all files one by one!

Eric Cheng and other blogs appearing after a google search pointed out S3sync as a suitable tool to remove a non-empty bucket.

So first one has to get Ruby and then also the OpenSSL interface for Ruby: sudo aptitude install ruby libopenssl-ruby

Then download s3sync (to your /home/yourself folder in this case) and unpack it: cd $HOME/
wget tar xvzf s3sync.tar.gz
rm s3sync.tar.gz

This creates a s3sync folder containing the ruby code.

The package ca-certificates includes PEM files of CA certificates to allow SSL-based applications to check for the authenticity of SSL connections. It is needed to have the S3 connection secure via SSL and part of the default Ubuntu installation (at least included in my Xubuntu Karmic Koala. If not: sudo aptitude install ca-certificates

Before using s3sync get your access key and secret access key from Amazon. It has to be included in a file “s3config.yml” which is located in your home folder inside the directory “.s3conf” which has to be created. So:

mkdir $HOME/.s3conf
to create the directory.

Open your favorite text editor and create a plain textfile called s3config.yml inside the “.s3conf” folder which contains: aws_access_key_id: YourS3AccessKeyFromAmazon
aws_secret_access_key: YourS3SecretAccessKeyFromAmazon
SSL_CERT_DIR: /etc/ssl/certs

Prevent others from reading the configuration file containing your confidential access codes by chmod 700 $HOME/.s3conf/s3config.yml

Now you can start to use s3sync and s3cmd to manipulate your S3 storage space with e.g.: ruby $HOME/s3cmd.rb listbuckets

This was the first time I managed to manipulate successfully my S3 account. Ok, Djungledisk under Mac OS-X worked, but it is proprietary, though not expensive.

John Eberly’s blog was an inspiration to get started. Follow the link to his excellent blog post.

cwRsync – Transparent Backups in Windows XP

The Unix tool rsync turned out to be a fast and reliable way to backup my /home folder to an USB-disk.

ITeF!x provides this installation-how2. He seems to maintain the packages. The website is quite informative, though a bit confusing to me. The download link did not work today,   so I found another download location.


Supported platforms: Client – Windows 9x/NT/2000/XP/2003, Server – NT/2000/XP/2003.

[Download cwRsync.] cwRsync comes as a zip archive containing a Nullsoft Installer package. Unzip downloaded file and run cwRsync_x.x.x_Installer.exe or cwRsync_Server_x.x.x_Installer.exe (server version) :

  1. Click Next at Welcome-page
  2. View license agreement.
  3. Select components that varies depending on package type: Client package has an optional component (Secure Channel Wrapper & Wizard). It makes creation of secure channels to cwrsync servers an easy task.
  4. Specify an installation location.
  5. (cwRsyncServer only) Specify a service account.
  6. Installation starts. By clicking ‘Details’ button, you can get more detailed information about installation. Check if everything seems ok.

You’re DONE! cwRsync is installed on your machine.

Rsync’ing a USB-disk with a Windows XP folder

Use your text-editor-of-choice and paste the following line into it:
rsync -au --exclude '.*' --exclude 'Music/' "/cygdrive/e/" "/cygdrive/h/DATA/home/"

My USB-drive showed up in the windows explorer as E:/ therfore it says /cygdrive/e/. The Windows folder was H:/DATA/home/ which translates into /cygdrive/h/DATA/home/. Change the paths to your specific situation.

Save the file as syncUSB.bat. The .bat tells Windows to execute the script in the commandline.

A variation of the script is
cwRsync\bin\rsync -au --exclude '.*' --exclude 'Music/' "/cygdrive/e/" "/cygdrive/h/DATA/home/"
when both the syncUSB.bat and the installation folder of cwRsync are in the folder which gets updated from the USB-disk.

Reinstalling Applications after a Fresh Install

Once a while I am tempted to upgrade my OS or try another flavor. Now I started trying them on cheap 8-16GB USB-disks so I do not need to mess up my working system anymore…

The problem always is, that after using an OS for some month a lot of applications were installed and configuered. This took a lot of time. It is always a lot of work to get them all in place again and often I forgot about them, until I needed them. Preferably in a situtation without internet connection, so no way “sudo aptitude install” …

I was already up and going to create a script, manually punching everything which I found necessary , but then I found a preconfigured solution.

According to the great Ubuntu Guide:

If you upgrade your Ubuntu system with a fresh install, it is possible to mark the packages and services installed on your old system (prior to the upgrade) and save the settings (“markings”) into a file. Then install the new version of Ubuntu and allow the system to reinstall packages and services using the settings saved in the “markings” file. For instructions, see this Ubuntu forum thread. In brief:

  • On the old system: Synaptic Package Manager -> File -> Save Markings
  • Save the markings file to an external medium, such as USB drive.
  • Complete the backup of your system’s other important files (e.g. the /home directory) before the fresh install of the new system.
  • In the freshly installed new system, again open Synaptic Package Manager -> File -> Read markings and load the file on your USB drive (or other external storage) previously saved.

Note: Many packages, dependencies, and compatibilities change between version of Ubuntu, so this method does not always work. Automated updates remains the recommended method.

Manage Amazon S3 Buckets

Yeah, delight!

I was using crappy development scripts to fiddle with S3 buckets on Amazon Web Services (AWS). Creating, listing, deleting buckets and so on was not that straightforward and I found it not well documented… have a growing suspicion that I am just not capable of web-searches…

OK, there is an easy way:

A graphical user interface.

Unfortunately it refused to work with Ubuntu-Firefox, but did work in Windows XP-IE5.

Ok, another tool just found is the S3 manager add-on to Firefox. This finally turned out to be the easiest way to connect to Amazones Web Services and create an online storage (“bucket”), edit or delete them.