Difference between revisions of "Backup"
(→Backing up a server with Rsync: transliterate patch) |
(→Backing up a server with Rsync: applying the transliterate patch) |
||
Line 130: | Line 130: | ||
{{code|<bash>rsync -av –delete -e ssh / USER@HOST:/LOCATION --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found}</bash>}} | {{code|<bash>rsync -av –delete -e ssh / USER@HOST:/LOCATION --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found}</bash>}} | ||
− | '''NOTE:''' there is a problem when trying to backup Maildirs like this if the target machine is Windows (like the ADrive service) because the file names contain colons. This has been addressed on the form of the [https://git.samba.org/?p=rsync-patches.git;a=blob;f=transliterate.diff;h=19d6393537903a1fb7d5581b8216b999fa82a450;hb=135a233d6f4d401c187117ae57fac147f2a863a4 transliterate path] which adds a '''--tr=BAD/GOOD''' option for mapping bad characters to good ones. | + | '''NOTE:''' there is a problem when trying to backup Maildirs like this if the target machine is Windows (like the ADrive service) because the file names contain colons. This has been addressed on the form of the [https://git.samba.org/?p=rsync-patches.git;a=blob;f=transliterate.diff;h=19d6393537903a1fb7d5581b8216b999fa82a450;hb=135a233d6f4d401c187117ae57fac147f2a863a4 transliterate path] which adds a '''--tr=BAD/GOOD''' option for mapping bad characters to good ones. To install the patch you need to download and unpack the latest source and the patches, then change into the source directory and do the following: |
+ | {{code|<bash> | ||
+ | patch -p1 <patches/transliterate.diff | ||
+ | ./configure | ||
+ | make | ||
+ | make install | ||
+ | </bash>}} | ||
== Troubleshooting == | == Troubleshooting == |
Revision as of 14:45, 12 July 2014
The organic design backups are created daily and compressed to 7zip and distributed over Secure Copy Protocol (SCP) to various other domains.
Contents
- 1 ADrive - Excellent Cloud Storage Service
- 2 Setting up automated backups over SCP
- 3 Wiki backups
- 4 Security
- 5 About LZMA compression
- 6 Using rsync over SSH
- 7 Restoring a single database from a dump of all databases
- 8 Duplicate a database
- 9 Backing up a server with Rsync
- 10 Troubleshooting
- 11 Online backup services
- 12 See also
ADrive - Excellent Cloud Storage Service
There's a lot of very good backup services around now, many offering 50GB or more of free storage space, but one thing I haven't been able to find for free is a service offering a good amount of space that allows files to be transferred using standard Linux shell commands such as rsync and scp. Last month I found ADrive which offers an excellent 100GB service for only $2.50 per month that has many useful features such as public files, shared online editing and more. But most importantly from my perspective is that they allow transfer via rsync and scp with zero-configuration and an excellent data transfer rate! this means I can have my servers automatically transfer my encrypted backup files to the ADrive cloud storage on a schedule. Get started here :-)
Setting up automated backups over SCP
The backups are done by running the backup-host.pl script from root. The script expects each host in the backup network to have a dedicated user called scp and a directory called /backup which must be writable by the scp user so that backups being transferred from remote servers can be put there. We can create the directory and give it the minimum level of access as follows,
This allows the scp user to create new backup files and write to them, but it doesn't have any permission to read them.
Each scp user will have the same RSA key pair and authorized_keys file in it's .ssh directory, the latter containing the following line to allow the scp user to do nothing but accept files into the /backup directory (sending is done by the root user, so only receiving needs to be allowed by the scp user).
When initially setting up a new host to be part of the backup network a manual transfer must first be done to each of the remote hosts to ensure their fingerprints are added to the known_hosts file. The target directory is just "/" because the root directory for scp users is forced to /backup in their authorized_keys file. This must be done from a root shell because the scp users don't have permission to do anything except receive backup files.
The configuration file
The configiration for the backups is done via a file in the same directory as the script called backup-host.conf The following parameters are recognised:
$admin
The email address that important information such as low space or failed transfers should be sent to. The default is admin@organicdesign.co.nz.
$host
The name that should be included at the start of the backup files to identify the machine that they're backed up on. The default is the machines hostname, but it's useful to be able to override this since many servers are given meaningless hostnames, or have the same hostname as other servers in the network.
$pass
The MySQL root password for backup up the MySQL databases, and/or for encrypting the backups of confgiuration files.
$disk
This is the device name such as /dev/sda1 that, if set, will report low space notification to the admin email address so that manual file pruning can be done if necessary.
$free
The number of gigabytes of free space below which a low-space notification email will be sent to the admin email address.
@conf
A list of files that should be encrypted (with root MySQL password) as they may contain sensitive information such as passwords or private keys.
@files
A list of file/directory locations to include in weekly file backups.
@excl
A list of file/directory locations that should be excluded from the file backups, this should include locations which are very large and don't undergo change since they'll be better manually backed up, and also any files that may contain sensitive information. Note that there's no need to include any of the locations listed in @conf as these will automatically be excluded.
@scp
A list of servers to send backups to over SCP protocol using the scp user.
File pruning
The backup files will take up a huge amount of space as time goes on, because each year they'll be 365 database backups and 52 file-system and configuration backups, and there could be a number of hosts sending their backups to each server too. To prevent the space getting quickly consumed, the backup files are "pruned" so that there are fewer and fewer of them as they get older.
There are only half as many files after they're older than a month, a quarter over two months, and so on. After a year there is only one file per month, and only one file per year more than two years old.
Another way of reducing the space is by not including any standard code-bases etc in the backups - anything that will be rebuilt by going through the documentation for the server has no reason to be backed up. Only parts of the system which are not covered by documentation such as the mathaba.net main site need to have a full file-system backup, but even this may only need to be a one-off backup with just minimal areas of the file-system being regularly backed up - such as locations to which files are uploaded etc.
Wiki backups
We use a general backup script to backup wikis on the various servers we administer which is in our Subversion tools repository here. The script dumps databases and compresses them to 7zip, then sends them over SCP to a remote server. The script takes two parameters, the first is the filesystem path to the wiki, and the second is the SCP target location. The script obtains the database connection details directly from the wiki's LocalSettings.php file. Here's a snippet of Perl code from the script showing the exporting, compression and remote connection:
Note: we're using the latin1 character set because of issues with MySQL character-encoding, see manually backup a wiki for more detail.
Security
The backups are stored locally in the /backup directory. If a user needs to read these manually with SCP, then the mode of this directory should be 770 and the user should be added to the root group. The directory and all the backups are owned by root and in the root group.
This script connects to the remote server using a user called "scp" which needs to be setup on the remote server and added to the "Allow Users" directive in /etc/ssh/sshd_config if you use that like we do to restrict only the specified users to have shell access to the server.
Also, the script can't (shouldn't) enter passwords, so an RSA key-pair needs to be created with ssh-keygen -t RSA (it's a good practice to disable password logins completely in /etc/ssh/sshd_config). The private part goes into /root/.ssh/ on the local server running the backup backed up (or the .ssh directory of whatever user will be running the script - we run it from the crontab as root). The public part of the key-pair goes into /home/scp/authorized_keys on the remote server.
One more security precaution is to lock the scp user down so that it can't be used for anything else except transferring files from the backup server into the /home/scp directory. To do this, we restrict the command that the user can execute, and turn off all the other SSH services that are usually available to applications using the SSH protocol. To do this prepend the following before the RSA public key in /home/scp/authorized_keys.
About LZMA compression
LZMA is an extremely good compression method which compresses our backups to about one third of the size of the gzip or bzip. I have tested it with the free 7-zip file manager from www.7-zip.org and od-wiki-db-2006-11-20 is 268MB uncompressed, 54.9MB gzipped and only 21.7MB as a 7z! But I'm unable to get the Debian port to work due to dependency issues with low level C libraries that I don't want to mess with.
- I've found a standalone version at http://sourceforge.net/projects/p7zip and that's compressed it to 24.8MB, not quite as small as the windows one, but still very good.
- Using switches -t7z -m0=lzma -mx=9 has got it down to 21.1MB - slightly smaller than the windows version :-)
Statistics
7Zip is extremely good at compressing wiki data compared to other algorithms, perhaps due to compressing the history more efficiently, here's a size comparison for compressing a server image which is a standard linux file structure containing no database or web site content.
Compression | server image | wiki backup |
---|---|---|
none | 517MB | 269MB |
7z | 122MB (76%) | 21.1MB (92%) |
RAR | 140MB (72%) | 24.9MB (90%) |
Bzip2 | 176MB (66%) | 38.0MB (86%) |
Gzip | 197MB (62%) | 54.5MB (80%) |
Zip | 197MB (62%) | 54.5MB (80%) |
Using rsync over SSH
Sometimes it's useful to do a one-off backup of a file structure from one host to another, and since all the hosts (in our system) are guaranteed to be able to connect to each other with SSH (after adding appropriate RSA keys), using rsync over SSH is a good way to do this.
The transfer syntax is then done very similarly to SCP, for example to pull new changes from a remote directory to a local one, use:
After the systems are confirmed as being able to connect over SSH you may want to lock them down so that the connection between them can only be used for rsync. The IP and command can be prepended to the key in the remote hosts ~/.ssh/authorized_keys file.
For more security, the command allowed can be restricted to just that specific rsync command. This can be done by manually running the rsync command with the -e'ssh -v' option which will output the exact command sent that can be used in the remote hosts authorized_keys file instead of just "rsync".
Restoring a single database from a dump of all databases
Restoring a database from a dump is pretty basic, but one issue that can crop up is when the dump was for all databases (as our daily backups are) but you want to restore just one of the databases in the dump. The following shell command is does this.
Duplicate a database
- Note: There is NO space between -p and [password]
Backing up a server with Rsync
Rsync is a very useful tool for backing up and synchronising data, here's a simple way to transfer everything from one server to a location on a remote server.
NOTE: there is a problem when trying to backup Maildirs like this if the target machine is Windows (like the ADrive service) because the file names contain colons. This has been addressed on the form of the transliterate path which adds a --tr=BAD/GOOD option for mapping bad characters to good ones. To install the patch you need to download and unpack the latest source and the patches, then change into the source directory and do the following:
Troubleshooting
See also SQL.
Out of resources with mysqldump
The server can run out of resources when doing a backup of all the databases, try reducing the table_open_cache in my.cnf to 200 or so.
Duplicate entries on import
Sometimes dumps have problems where a row with a that should be unique has one or more duplicate entries. Mysql can be called with the -f option which makes it continue importing after these occurrences, but the problem is that the SQL directive that contained the error cannot continue and it goes straight to the next line of SQL. The dumps contain a single SQL command that populates an entire table, so that means the row before duplicate will be the last thing imported into that table.
To handle this scenario, you can edit the dump and remove the line in the table definition that forces the column to be unique, then import the data again and manually remove the duplicates, then alter the table to put the unique requirement back. The dumps can be very large and difficult to edit, so I use perl to remove the duplicate with something like the following:
Online backup services
- Tarsnap - very cost-effective if you don't have a lot of data
- ADrive - very cost-effective if you do have a lot of data
See also
- Wikipedia backups
- Good discussion on Linux and backing up with rsync
- 10 Linux backup tools
- Sky - EMP-proof and water-proof 2010 backup