Backup
The organic design backups are created daily and compressed to 7zip and distributed over SCP to various other domains. Eventually backup will become unnecessary because we plan to move all our information into distributed space which will include both servers and workstations data.
Contents
Changing to rsync
The amount of data we need to back up each day is getting larger, and currently we're only backing the web structure and user's emails up on a weekly basis due to the amount of traffic required. We moving more of our users over to our IMAP folders for email and may be taking on some larger sites which will require us to maintain hundreds of gigabytes rendering our current system unworkable.
rsync is an application for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. rsync can copy or display directory contents and copy files, optionally using compression and recursion.
After changing over to rsync, our backup scripts can be greatly simplified since compression will no longer be required (since compression will prevent rsync's delta encoding from being effective). All that will be necessary is for the SQL to be dumped to a common location within the backed up structure such as in /var/www. We will probably backup the entire /home directory and move sizeable such as old backups and downloaded applications source into other locations such as /src and /backup.
Our new procedure will be more generic allowing a group of peers to share a file structure which is mapped directly in to a filesystem mount-point. The procedure is incomplete, but is being developed and discussed in the set up a distributed file system article.
Simple workstation backup
The following backup-workstation.pl Perl script is a simple backup solution for a local workstation which compresses and 7-zip's a list of directory trees.
Simple server backup
The following backup.pl Perl script is a simple backup solution to backup both a directory tree and all databases in a local MySQL server, and save them to a 7-zip compressed file named according to the date. It also logs the backup and it's file size in Server log using SimpleForms. It can be executed periodically from the crontab.
MySQL Replication
MySQL 5.0 offers a new thing called replication which allows many slave databases to be kept syncronised with a single master. Even running the slaves locally is of benefit because regular backup and distribution can be made from the slaves so that the master never needs to be locked.
- 20 November 2006 - regarding changeover to LZMA (.t7z)
- 23 November 2006 - 7z server backups which can be unpacked the same as our template in Debian Conversion
- 20 December 2006 - backup corruption in transfer
Backup list
The following list of backup files are currently being managed:
- od-wiki-db
- od-server-image
- od-access-log
- peer-logs
- peerd-win32
- nad-org
- nad-kb
- zenia-org
- zenia-docs-settings
- fowin-dunkley
- fowin-luck
About LZMA
LZMA is an extremely good compression method which compresses our backups to about one third of the size of the gzip or bzip. I have tested it with the free 7-zip file manager from www.7-zip.org and od-wiki-db-2006-11-20 is 268MB uncompressed, 54.9MB gzipped and only 21.7MB as a 7z! But I'm unable to get the Debian port to work due to dependency issues with low level C libraries that I don't want to mess with.
- I've found a standalone version at http://sourceforge.net/projects/p7zip and that's compressed it to 24.8MB, not quite as small as the windows one, but still very good.
- Using switches -t7z -m0=lzma -mx=9 has got it down to 21.1MB - slightly smaller than the windows version :-)
Statistics
7Zip is extremely good at compressing wiki data compared to other algorthims, perhaps due to compressing the history more efficiently, here's a size comparison for compressing a server image which is a standard linux file structure containing no database or web site content.
Compression | server image | wiki backup |
---|---|---|
none | 517MB | 269MB |
7z | 122MB (76%) | 21.1MB (92%) |
RAR | 140MB (72%) | 24.9MB (90%) |
Bzip2 | 176MB (66%) | 38.0MB (86%) |
Gzip | 197MB (62%) | 54.5MB (80%) |
Zip | 197MB (62%) | 54.5MB (80%) |