Automated Incremental Backups with Rsync and Cron (Part 1)
Have you heard the one about 3-2-1 backups? It’s the idea that you should have 3 copies of your files stored on 2 different mediums and one of the copies should be off-site. I’m not quite there yet, but I’m a little closer now. I have been using rsync, a utility for making incremental backups, for awhile, but I’ve always run the script I created for it manually. I typically did this every three to four weeks, you know when I thought about it. Yesterday I finally created a cron job to automate this process, and it is so much nicer. Read on to learn for yourself how to automate your backups on Linux. Part 1 of this article covers rsync. Part 1a covers include and exclude rules for rsync, and Part 2 covers cron.
Rsync Basics
If you already know how to use rsync, skip this part. For those of you who are just starting your journey into Linux backups, read on.
Rsync is a great utility for copying files for backups. It uses something called a delta-transfer algorithm that makes backing up speedy and minimizes the amount of data that has to be transferred. It does this by comparing the directory that you want to backup with whatever is at the destination you specify. Then it only copies over new files or the changes to existing files. It can even delete files in the destination directory that have been deleted in the source directory, which keeps your backup in sync with the source folder. Your rsync connection can also use ssh which means the data transfer is encrypted. This should be considered necessary for an off-site backup.
The remainder of this article will explain how to set up rsync by way of examples.
Backup To A Connected Drive Or Folder
If you are merely copying your files to a local (internal or external) drive or a mounted, shared network folder, use the following syntax:
rsync -av /directory/to/copy/ /destination/directory
The options I have selected are ‘a’ and ‘v.’ The ‘a’ option preserves dates, times, permissions, etc., and uses recursion (everything inside every directory located in /directory/to/copy/ will be copied). The ‘v’ stands for verbose, and that means that you will see what the rsync command is doing in the terminal when you call it. There are plenty of other options that you can use, such as ‘z’ which will also compress your data, but I will leave it up to you to check the man pages, if you think ‘-av’ won’t quite fit the bill.
Another thing to pay attention to is the trailing slash on /directory/to/copy/. Doing this means that only the contents of the specified directory are copied. The directory itself is not copied. E.g., if you are backing up a folder called “SaveMe” with “file1” and “file2” inside it. Using the trailing slash will result in “file1” and “file2” being copied into /destination/directory. If you also want the “SaveMe” directory to be copied, leave off the trailing slash. This will result in “SaveMe” being copied into /destination/directory, and “file1” and “file2” being copied inside of the new “SaveMe” folder.
Backup Over A Network
If you are backing up to another computer on your network, use the following syntax:
rsync -av -e ssh /directory/to/copy/ userName@hostName(or IP):destination/directory
Here we are using ssh to connect to the other computer at the specified location. The userName@hostName(or IP) should be replaced with whatever you would type in to connect to the machine over ssh (e.g., backUp@192.168.1.100). Of course, this requires that you have ssh setup on both machines.
Backup Over A Network Using Rsync Daemon
Rather than use SSH to connect to a network computer, you may can also set up the destination machine to run rsync as a daemon. You can find some setup instructions for doing so in Ubuntu’s Community Wiki. You may find, as I did, that some NAS boxes will default to running rsync as a daemon, which makes the setup pretty easy (I am using a Synology NAS). To connect to an rsync daemon, use the following syntax:
rsync -av /directory/to/copy/ userName@hostName(or IP)::destination/directory
The difference between the syntax of this command and the previous is merely the addition of the second colon. When running the command, you will be asked for the password of the user you specified in place of “userName” above.
There is a big downside to connecting to an rsync daemon vs. using ssh of which you should be aware. When you connect to the daemon, none of the data you are sending is encrypted. This may be fine if you are sending information over a local network but not if your destination machine is off-site. Fortunately, you can tell rsync to use ssh even when connecting to a daemon using the following syntax.
rsync -av -e ssh /directory/to/copy/ userName@hostName(or IP)::destination/directory
If you are connecting to something like a Synology NAS which will only allow admin users to connect via SSH, but the user account you use for backups is not an admin, you are not out of luck. You can actually specify different users for the SSH and the rsync parts:
rsync -av -e "ssh -l adminName" /directory/to/copy/ userName@hostName(orIP)::destination/directory
A Note On Rsync via SSH
If you plan to use cron to automate your backups, as will be explained in Part 2 of this article, an ssh key with a passphrase will prove problematic. While your rsync job may run just fine manually, the same script will fail to connect when run in cron. To combat this, you can create an ssh key without a passphrase. Yes this is insecure (now anyone who gets the key can connect to your backup server), but we can make it somewhat more secure. To do this you will edit the authorized_keys file on your backup server. This file will be found in the .ssh directory of the user you ssh into.
nano /home/<user>/authorized_keys
If there is more than one key in this file, find the relevant key (probably the last one if you just created a passphrase-less ssh key). Type the following at the front of the line where the key begins:
from="<IPaddressOfComputerYouConnectFrom>",no-agent-forwarding,no-port-forwarding,no-pty,no-user-rc,no-X11-forwarding
You may also restrict the commands that can be run via the ssh connection by adding command=”
Keeping In Sync
Beyond merely copying the files in the specified directory, rsync can effectively sync your /destination/directory with your /directory/to/copy/. Specifically, it will look to see if there are files in /destination/directory that no longer exist in /directory/to/copy. If it finds such files, it deletes them. To enable this feature, add the “–delete” option to your rsync command. Here is an example:
rsync -av -e "ssh -l adminName" --delete /directory/to/copy/ userName@hostName(or IP)::destination/directory
Test Run
Prior to actually preforming your backup, you can use the dry-run option to see exactly what will happen. To do so, you merely add the “–dry-run” option, like so:
rsync --dry-run -av -e "ssh -l adminName" --delete /directory/to/copy/ userName@hostName(orIP)::destination/directory
NOTE: Make sure to remove “–dry-run” once you are happy with how the command will function.
Creating A Script
After getting this all setup, you probably don’t want to have to remember the exact command that you crafted. Fortunately, a short script is all you need. Even if you think you can remember your entire command, creating a script will be helpful for when we automate the backup process with cron. If you do not have a folder for your scripts, you can create a folder called ‘bin’ in your home folder (or whatever you want). Now create a new text file in your favorite text editor (e.g., nano, vim, emacs, gedit). Call it what you want (e.g., homeRsync.sh), but make sure you use the extension ‘.sh’ at the end. Make sure to save it in your scripts folder. The file will only consist of two lines. The first just tells the computer to use the bash shell to execute the code that follows. The second is your rsync command. Here is an example:
#!/bin/bash
rsync -av -e "ssh -l admnName" --delete /directory/to/copy/ user@hostName(or IP)::destination/directory
Next we have to make it executable. In your terminal, change directories to where you have the script stored. Then enter the following command:
chmod +x scriptName.sh
Now you can run the script, like so:
./scriptName.sh
NOTE: The above assumes you are currently in the directory where the script is located.
You should now have a functioning rsync script. If you would really only like to backup part(s) of /directory/to/copy/ you can read about include and exclude rules in Part 1a of this article. If you are ready to use cron to automate your rsync script, move on to Part 2. If you have a question or other feedback, leave it in the comments below.