Using rsync to efficiently backup files

Linux command line programRsync is a Linux command line program for synchronizing directories and files on different computers. Rsync will maintain an exact copy of the remote directory on your local computer. It does this by downloading all the files once and then only downloading the files that have changed the next time you run it. This way your bandwidth usage is minimized and the time taken to make backups is also reduced.

Rsync can connect to the remote server in a variety of ways. The default way is to use Secure Shell or SSH. SSH connections are encrypted so this is a very secure way to download backups. Using key based SSH authentication you can automate the connection process allowing you to run rsync via scripts and cron jobs.

Using Rsync is easy. A simple example to start things off:

rsync -avz user@remotehost:/remote/directory/ /local/directory/

The options specified above are:

  • -a for archive – Preserves permissions of the copied files and also copies symbolic links.
  • -v for verbose – Outputs file names as they are copied.
  • -z for compression – Compresses the file transfer to reduce bandwidth usage and increase transfer speed.
  • user@remotehost:/remote/directory/ – The remote server from which you want to download the files. In other words this is the source of the files.
  • /local/directory/ – The local directory to which you want to copy the files. This is the destination of the files. The destination is synchronized to match the source i.e. the source is left unchanged and only the destination directory is altered.

By default rsync will not delete files that are found in the destination directory but not the source. So for a true synchronization you will want to add the –delete option as well.

rsync -avz --delete user@remotehost:/remote/directory/ /local/directory/

Scripting Rsync

You can run Rsync at regular intervals using a cron job. Furthermore, you can email yourself the Rsync output so that you know what files were synced and if any errors occurred. So first you make a script like this which will email the output of the rsync session to you:

#!/bin/bash
emailaddress='user@host'
output=`rsync -avz  --delete -e "ssh -i /home/localuser/.ssh/id_rsa -p 1234" remoteuser@remotehost.com:/remote/directory /local/directory/`
echo $output|mail -s "Rsync.sh: Rsync run" $emailaddress;

The -e switch allows you to specify a custom shell to access the remote server. SSH is the default shell for rsync but because we need to specify custom options we have to use the -e switch. You can see in the above script that I have specified an SSH private key to be used as part of the authentication process and a custom SSH port 1234.

Once you have customized and saved the script you can add a cron job like this to run it once a day:

1 0 * * * /path/to/script.sh

Rsync is an excellent tool for backing up servers. It’s better than using tar because you don’t have to download all the data everytime you run it. You only download the files that have changed. As a result Rsync has become one of the standard ways of backing up a Linux server. But for incremental backups you should use rsnapshot. Rsnapshot is a collection of perl scripts that automate the process of keeping incremental backups using rsync.

2 thoughts on “Using rsync to efficiently backup files

Leave a Reply to Abdussamad Cancel reply

Your email address will not be published. Required fields are marked *