Backup

From Organic Design wiki
Revision as of 12:24, 5 February 2015 by Nad (talk | contribs) (ADrive - Excellent Cloud Storage Service)

The organic design backups are created daily and compressed to 7zip and distributed over Secure Copy Protocol (SCP) to various other domains.

ADrive - Excellent Cloud Storage Service

There's a lot of very good backup services around now, many offering 50GB or more of free storage space, but one thing I haven't been able to find for free is a service offering a good amount of space that allows files to be transferred using standard Linux shell commands such as rsync and scp. Last month I found ADrive which offers an excellent 100GB service for only $2.50 per month that has many useful features such as public files, shared online editing and more. But most importantly from my perspective is that they allow transfer via rsync and scp with zero-configuration and an excellent data transfer rate! this means I can have my servers automatically transfer my encrypted backup files to the ADrive cloud storage on a schedule. Sign up with my referrer link and then check out our rsync article for details about how to automatically sync encrypted backups to your ADrive account.

Setting up automated backups over SCP

The backups are done by running the backup-host.pl script from root. The script expects each host in the backup network to have a dedicated user called scp and a directory called /backup which must be writable by the scp user so that backups being transferred from remote servers can be put there. We can create the directory and give it the minimum level of access as follows,

<bash>mkdir /backup

chown root:scp /backup chmod 730 /backup</bash>

This allows the scp user to create new backup files and write to them, but it doesn't have any permission to read them.

Each scp user will have the same RSA key pair and authorized_keys file in it's .ssh directory, the latter containing the following line to allow the scp user to do nothing but accept files into the /backup directory (sending is done by the root user, so only receiving needs to be allowed by the scp user).

command="scp -t /backup",no-port-forwarding,no-pty,no-agent-forwarding,no-X11-forwarding ssh-rsa AAAAB3Nza...jdIKh4jjd scp@od


When initially setting up a new host to be part of the backup network a manual transfer must first be done to each of the remote hosts to ensure their fingerprints are added to the known_hosts file. The target directory is just "/" because the root directory for scp users is forced to /backup in their authorized_keys file. This must be done from a root shell because the scp users don't have permission to do anything except receive backup files.

<bash>echo "TestFile" > foo

scp -i /home/scp/.ssh/id_rsa foo scp@host1.com:/ scp -i /home/scp/.ssh/id_rsa foo scp@host2.com:/ scp -i /home/scp/.ssh/id_rsa foo scp@host3.com:/</bash>

The configuration file

The configiration for the backups is done via a file in the same directory as the script called backup-host.conf The following parameters are recognised:

$admin

The email address that important information such as low space or failed transfers should be sent to. The default is admin@organicdesign.co.nz.

$host

The name that should be included at the start of the backup files to identify the machine that they're backed up on. The default is the machines hostname, but it's useful to be able to override this since many servers are given meaningless hostnames, or have the same hostname as other servers in the network.

$pass

The MySQL root password for backup up the MySQL databases, and/or for encrypting the backups of confgiuration files.

$disk

This is the device name such as /dev/sda1 that, if set, will report low space notification to the admin email address so that manual file pruning can be done if necessary.

$free

The number of gigabytes of free space below which a low-space notification email will be sent to the admin email address.

@conf

A list of files that should be encrypted (with root MySQL password) as they may contain sensitive information such as passwords or private keys.

@files

A list of file/directory locations to include in weekly file backups.

@excl

A list of file/directory locations that should be excluded from the file backups, this should include locations which are very large and don't undergo change since they'll be better manually backed up, and also any files that may contain sensitive information. Note that there's no need to include any of the locations listed in @conf as these will automatically be excluded.

@scp

A list of servers to send backups to over SCP protocol using the scp user.

File pruning

The backup files will take up a huge amount of space as time goes on, because each year they'll be 365 database backups and 52 file-system and configuration backups, and there could be a number of hosts sending their backups to each server too. To prevent the space getting quickly consumed, the backup files are "pruned" so that there are fewer and fewer of them as they get older.

There are only half as many files after they're older than a month, a quarter over two months, and so on. After a year there is only one file per month, and only one file per year more than two years old.

Another way of reducing the space is by not including any standard code-bases etc in the backups - anything that will be rebuilt by going through the documentation for the server has no reason to be backed up. Only parts of the system which are not covered by documentation such as the mathaba.net main site need to have a full file-system backup, but even this may only need to be a one-off backup with just minimal areas of the file-system being regularly backed up - such as locations to which files are uploaded etc.

Wiki backups

We use a general backup script to backup wikis on the various servers we administer which is in our Subversion tools repository here. The script dumps databases and compresses them to 7zip, then sends them over SCP to a remote server. The script takes two parameters, the first is the filesystem path to the wiki, and the second is the SCP target location. The script obtains the database connection details directly from the wiki's LocalSettings.php file. Here's a snippet of Perl code from the script showing the exporting, compression and remote connection:

{{{1}}}

Note: we're using the latin1 character set because of issues with MySQL character-encoding, see manually backup a wiki for more detail.

Security

The backups are stored locally in the /backup directory. If a user needs to read these manually with SCP, then the mode of this directory should be 770 and the user should be added to the root group. The directory and all the backups are owned by root and in the root group.

<bash>addgroup fred root

chmod 770 /backup</bash>


This script connects to the remote server using a user called "scp" which needs to be setup on the remote server and added to the "Allow Users" directive in /etc/ssh/sshd_config if you use that like we do to restrict only the specified users to have shell access to the server.

Also, the script can't (shouldn't) enter passwords, so an RSA key-pair needs to be created with ssh-keygen -t RSA (it's a good practice to disable password logins completely in /etc/ssh/sshd_config). The private part goes into /root/.ssh/ on the local server running the backup backed up (or the .ssh directory of whatever user will be running the script - we run it from the crontab as root). The public part of the key-pair goes into /home/scp/authorized_keys on the remote server.

One more security precaution is to lock the scp user down so that it can't be used for anything else except transferring files from the backup server into the /home/scp directory. To do this, we restrict the command that the user can execute, and turn off all the other SSH services that are usually available to applications using the SSH protocol. To do this prepend the following before the RSA public key in /home/scp/authorized_keys.

command="scp -t /home/scp",no-port-forwarding,no-pty,no-agent-forwarding,no-X11-forwarding

About LZMA compression

LZMA is an extremely good compression method which compresses our backups to about one third of the size of the gzip or bzip. I have tested it with the free 7-zip file manager from www.7-zip.org and od-wiki-db-2006-11-20 is 268MB uncompressed, 54.9MB gzipped and only 21.7MB as a 7z! But I'm unable to get the Debian port to work due to dependency issues with low level C libraries that I don't want to mess with.

  • I've found a standalone version at http://sourceforge.net/projects/p7zip and that's compressed it to 24.8MB, not quite as small as the windows one, but still very good.
  • Using switches -t7z -m0=lzma -mx=9 has got it down to 21.1MB - slightly smaller than the windows version :-)

Statistics

7Zip is extremely good at compressing wiki data compared to other algorithms, perhaps due to compressing the history more efficiently, here's a size comparison for compressing a server image which is a standard linux file structure containing no database or web site content.

Compression server image wiki backup
none 517MB 269MB
7z 122MB (76%) 21.1MB (92%)
RAR 140MB (72%) 24.9MB (90%)
Bzip2 176MB (66%) 38.0MB (86%)
Gzip 197MB (62%) 54.5MB (80%)
Zip 197MB (62%) 54.5MB (80%)

Restoring a single database from a dump of all databases

Restoring a database from a dump is pretty basic, but one issue that can crop up is when the dump was for all databases (as our daily backups are) but you want to restore just one of the databases in the dump. The following shell command is does this.

{{{1}}}

Duplicate a database

<bash>mysqldump -h [server] -u [user] -p[password] db1
  • Note: There is NO space between -p and [password]

Troubleshooting

See also SQL.

Out of resources with mysqldump

The server can run out of resources when doing a backup of all the databases, try reducing the table_open_cache in my.cnf to 200 or so.

Duplicate entries on import

Sometimes dumps have problems where a row with a that should be unique has one or more duplicate entries. Mysql can be called with the -f option which makes it continue importing after these occurrences, but the problem is that the SQL directive that contained the error cannot continue and it goes straight to the next line of SQL. The dumps contain a single SQL command that populates an entire table, so that means the row before duplicate will be the last thing imported into that table.

To handle this scenario, you can edit the dump and remove the line in the table definition that forces the column to be unique, then import the data again and manually remove the duplicates, then alter the table to put the unique requirement back. The dumps can be very large and difficult to edit, so I use perl to remove the duplicate with something like the following:

<perl>open DB, '<', 'dump.sql';

while( <DB> ) { print $_ unless /UNIQUE KEY `name_title` \(`page_namespace`,`page_title`\)/ } </perl>

Online backup services

  • Tarsnap - very cost-effective if you don't have a lot of data
  • ADrive - very cost-effective if you do have a lot of data
  • Dropbox - very cost-effective for space, but non-trivial connectivity

See also