Data Backup in the AWS Cloud with rsync

After admitting that of all things Microsoft offers 25GB cloud storage for its Windows Live subscribers I will walk through my latest preliminary experiments regarding backup of important data using the using the Amazon Advanced Web Services. The storage is not free but quite cheap at around 0.1$ per GB and month.

If you use Windows and MS Office a lot use Skydrive and don’t read on 😉 There are posts which describe how to map the Skydrive like a local harddisk using MS Word.

On the long run I would like to mount a EBS storage like a local file tree, probably using WebDAV, but this is my first successful preliminary solution. s3cmd does not work for me.

Using Ubuntu/Linux rsync is a well established, reliable and easy to use tool to keep data between locations in sync. The following post marries rsync with an Elastic Cloud (EC2) server instance for an hour or some. One has to set up the so called rsync daemon and attach a persistent Elastic Block Storage.

This is another post. I will link to it later. There will also be a small script. There are some holes in this tutorial, only the direct configuration of the rsync daemon (including the script) is complete and working. I filled in some hints how to get to this stage. But will write follow ups on that.

System Out provided a nice tutorial of how to set up the rsync in demon mode on a server which listens for clients to sync their data.

Here is my version of it, with a short script at the end which should do the job.


Of course you need to have rsync on both machines (the server and the client); since both are Ubuntu this is the case.

I will write another post on how to start the server. It is completely possible and quite intuitive to achieve it in the Amazon web interface. When the server is running and an extra EBS harddisk is attached you have to connect to the server using ssh

Mount the persistent drive

There are some posts about the advantages of the xfs filesystem, so I sticked to it. Alestic recommends it for all persistent EC2 cloud disks and I trust they know what they are doing. But xfs is not per default included in the Ubuntu micro instance I use for my backups. That said, in the SSH shell:

sudo apt-get install -y xfsprogs
sudo modprobe xfs

If the backup volume is newly created then format it:
sudo mkfs.xfs /dev/xvdb
Note: Only the first time. Otherwise you wipe your data, of course. Note also the device name. I attached it as /dev/sdb. Though it showed up in the Ubuntu Oneiric i386 t1.micro instance as /dev/xvdb.

Now mount the volume
echo "/dev/xvdb /media/backup xfs noatime 0 0" | sudo tee -a /etc/fstab
sudo mkdir /media/backup
sudo mount /media/backup
sudo chown ubuntu:ubuntu /media/backup
sudo chmod 777 /media/backup

Configuration files

On the server machine you need to set up a daemon to run in the background and host the rsync services.

Before you start the daemon you need to create some rsync daemon configuration files in the /etc directory.

Three files are necessary:

  1. /etc/rsyncd.conf, the actual configuration file,
  2. /etc/rsyncd.motd, Message Of The Day file (the contents of this file will be displayed by the server when a client machine connects) and
  3. /etc/rsyncd.scrt, the username and password pairs.

To create the files on the server:
sudo nano /etc/rsyncd.conf

Now enter the following information into the rsyncd.conf file:

motd file = /etc/rsyncd.motd
path = /media/backup
comment = the path to the backup directory on the server
uid = ubuntu
gid = ubuntu
read only = false
auth users = ubuntu
secrets file = /etc/rsyncd.scrt

Hit Ctrl-o to save and Ctrl-x to close nano.

The uid, gid, auth users are the users on the server. In the ssh session on the ec2 instance the user is ubuntu.

The format for the /etc/rsync.scrt file is

Use nano to put some arbitrary text into the /etc/rsync.motd.

Now you should have all the configuration information necessary, all that’s left to do is open the rsync port and start the daemon.

To open the port, open the /etc/default/rsync file, i.e.,

sudo nano /etc/default/rsync

and set RSYNC_ENABLE=true.

Here you might also specify another port than the default 873. Remember to open the port in the security group. Either with the AWS web interface in your browser or in the shell using the ec2-api-tools:
ec2-authorize default -p 873

Now to start the daemon,
sudo /etc/init.d/rsync restart
and exit the SSH session.

Syncing a folder

Now you can use your local shell to push some folders or files to the server. Update the server side from the client machine with ec2-api-tools installed:
EXIP=`ec2din | grep INSTANCE | grep -v terminated |awk '{print $4}'`
rsync -auv /home/rforge/articles ubuntu@$EXIP::backup/

$EXIP would be the server ip address

This gets the IP of the server from the ec2-api-tool and passes it to RSYNC.

Otherwise you have to remember the IP of your instance from the web interface and substitut it for
rsync -auv /PATH/TO/FOLDER/ ubuntu@$

::backup has to match [backup] in the /etc/rsyncd.conf file. You will see the rsyncd.motd message and get prompted for the password in the rsyncd.scrt file. Then rsync starts the upload.

A Script

The following script should do the daemon setup after connecting to the server via ssh and mounting the volume. Keep me posted if something does not work.

echo "motd file = /etc/rsyncd.motd
path = /media/backup
comment = the path to the backup directory on the server
uid = ubuntu
gid = ubuntu
read only = false
auth users = ubuntu
secrets file = /etc/rsyncd.scrt" > rsyncd.conf
sudo mv rsyncd.conf /etc/
sudo echo "Greetings! Give me the right password! Me want's it!" > rsyncd.motd
sudo mv rsyncd.motd /etc/
sudo echo "ubuntu:YourSecretPassword" > rsyncd.scrt
sudo mv rsyncd.scrt /etc/
sudo chmod 640 /etc/rsyncd.*
sudo chown root:root /etc/rsyncd.*
## enable demon mode in the /etc/default/rsync file
sudo cat /etc/default/rsync | sed 's/RSYNC_ENABLE=false/RSYNC_ENABLE=true/g' > rsync
sudo mv rsync /etc/default/
sudo chown root:root /etc/default/rsync
sudo chmod 644 /etc/default/rsync
sudo /etc/init.d/rsync restart # start the demon


Using ec2-api-tools

I am using ec2-api-tools with Ubuntu Lucid to connect and manage my Ubuntu Server on Amazon Web Services.

I followed closely the Ubuntu EC2 Starters Guide:

First one needs to install the ec2-api-tools

sudo aptitude install ec2-api-tools

The following requires that one has registered with AWS and has downloaded a keypair to the local computer. In order to use the ec2-api-tools from the shell one has to follow the EC2 Starter Guide to set up the private key in order to connect to ones AWS account.

Make sure you have the following environment variables set up in your shell profile. This is accomplished by adding the following lines to your ~/.bashrc if you use bash as your shell:

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/

Having installed the ec2-api and set up the environment variables correctly one can look for one of the official Ubuntu Server Images published by Cannonical. The owner ID of Cannonical at AWS is 099720109477 so looking up only those Webservers reduces the flood of output:

ec2dim -o 099720109477

This was what I am interessted in:

  1. Ubuntu Lucid 10.04 webserver
  2. 32-bit architecture
  3. Elastic Block Store Image (EBS), which can be saved as a snapshot; I want to keep my configurations, when I terminate the server.

So to have a look at what is matching those criteria

ec2dim -o 099720109477 | grep 10.04-i386 | grep ebs | cut -f 2,3

At the time of writing the output was

ami-714ba518 099720109477/ebs/ubuntu-images/ubuntu-lucid-10.04-i386-server-20100427.1
ami-1234de7b 099720109477/ebs/ubuntu-images/ubuntu-lucid-10.04-i386-server-20100827
ami-6c06f305 099720109477/ebs/ubuntu-images/ubuntu-lucid-10.04-i386-server-20100923

The first record (e.g. ami-714ba518) is the image-ID of the instance which becomes important to start or stop the instance for oneself. The list gets constantly longer as Cannonical releases updates.

ami-6c06f305 was the latest release(2010/09/23) at the time of writing.

To check if the keys are ok

Get keypair in case you have not done so before. Note: the name ‘ec2-keypair’ is arbitrary – choose what you like:
ec2addkey ec2-keypair

I downloaded the key to some folder on the local computer from the AWS site (open in your browser and sign in).

Here the ec2-api did not work for me: ‘ec2addkey ec2-keypair > ec2-keypair.pem’ as suggested in the EC2 Guide did not work! The problem seemed to be that the suggested code
ec2addkey ec2-keypair pasted the fingerprint on top of the key and the resulting file was then rejected.

Correct the permission for the keypair, so nobody else without superuser rights can access them (and so your EC2 account)
chmod 600 ec2-keypair.pem

Now open selected ports in your security setup for access with secure-shell, ftp, html and whatever you might want to set up:

For ssh (port 22)
ec2-authorize default -p 22

Open port 80 to access the apache2 server
ec2-authorize default -p 80

Open port 21 to access the ftp server
ec2-authorize default -p 21

Now we can start an instance. Remember the image-ID from above (ami-6c06f305)

ec2run ami-6c06f305 -k ec2-keypair

Note that it says just ‘ec2-keypair’ *without* ‘.pem’ extension. Important. The ec2run command without further options starts a ‘small’ instance.

Run ‘ec2din’ to get the external ip and the instance number. You need them for connecting via secure-shell and terminating the instance:

exip=`ec2din | grep INSTANCE | cut -f 4'`
inid=`ec2din | grep INSTANCE | cut -f 2'`

Of course you can as well just run ec2din and remember the external IP and the instance ID.

Connecting to the server
ssh -i ec2-keypair.pem ubuntu@$exip

Note, that you always connect as user ‘ubuntu’. If you did not initialize the ‘exip’ variable mentioned in the last step you would have to add the external IP manually, like
ssh -i ec2-keypair.pem ubuntu@

On first time start one might want to add lamp-server and other desired services:
sudo tasksel install lamp-server
aptitude install vsftpd ddclient

If you registered with a dynamic DNS service like DynDNS and setup ddclient correctly you could also do
ssh -i ec2-keypair.pem
I will comment on this in another post.

Remember: in order to get apache2 up/down/restarted
sudo /etc/init.d/apache2 stop
sudo /etc/init.d/apache2 start
sudo /etc/init.d/apache2 restart

The server is terminated by
ec2kill $inid

Setup FTP on Amazon EC2

If your Amazon EC2 instance is finally running – which is another story – one would want to have ftp access to upload files and documents.

I got the inspiration to use vsftpd from

  1. Open port 21 for ftp access on you running instances:
    ec2-authorize default -p 21
  2. Connect to your instance via ssh
    ip=`ec2din | grep I | cut -f17`
    ssh -i /path/to/yourkey.pem ubuntu@$ip
  3. Install vsftpd
    sudo aptitude install vsftpd
  4. Start the demon
    sudo /etc/init.d/vsftpd start