Difference between revisions of "Backup"

From Organic Design wiki
m (See also)
(transferring backups securly over SCP)
Line 1: Line 1:
The organic design backups are created daily and compressed to [[w:7zip|7zip]] and distributed over SCP to various other domains. Eventually backup will become unnecessary because we plan to move all our information into distributed space which will include both servers and workstations data.
+
The organic design backups are created daily and compressed to [[w:7zip|7zip]] and distributed over [[w:Secure copy|Secure Copy Protocol]] (SCP) to various other domains.
  
== Changing to rsync ==
+
== Setting up automated backups over SCP ==
The amount of data we need to back up each day is getting larger, and currently we're only backing the web structure and user's emails up on a weekly basis due to the amount of traffic required. We moving more of our users over to our IMAP folders for email and may be taking on some larger sites which will require us to maintain hundreds of gigabytes rendering our current system unworkable.
+
We use a general backup script to backup wikis on the various servers we administer which is in our Subversion tools repository [http://svn.organicdesign.co.nz/filedetails.php?repname=tools&path=%2Fbackup-wiki.pl here]. The script dumps databases and compresses them to ''7zip'', then sends them over SCP to another domain if format can be done in Perl as follows:
 +
{{code|<perl>$s7z = "all-$date.sql.7z";
 +
$sql = "$dir/all.sql";
 +
qx( mysqldump -u DB-USER --password='DB-PASS' --default-character-set=latin1 -A >$sql );
 +
qx( 7za a $dir/$s7z $sql );
 +
qx( chmod 644 $dir/$s7z );
  
[[w:rsync|rsync]] is an application for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer using [[w:delta encoding|delta encoding]] when appropriate. An important feature of rsync not found in most similar programs/protocols is that the [[w:mirror (computing)|mirror]]ing takes place with only one transmission in each direction. rsync can copy or display directory contents and copy files, optionally using compression and recursion.
+
qx( scp $tar.7z scp@REMOTE-SERVER-ADDR );
 +
qx( rm $tar.7z );
 +
}</perl>}}
  
After changing over to rsync, our backup scripts can be greatly simplified since compression will no longer be required (since compression will prevent rsync's delta encoding from being effective). All that will be necessary is for the SQL to be dumped to a common location within the backed up structure such as in ''/var/www''. We will probably backup the entire ''/home'' directory and move sizeable such as old backups and downloaded applications source into other locations such as ''/src'' and ''/backup''.
 
  
Our new procedure will be more generic allowing a group of peers to share a file structure which is mapped directly in to a filesystem mount-point. The procedure is incomplete, but is being developed and discussed in the [[set up a distributed file system]] article.
+
This script connects to the remote server using a user called "scp" which needs to be setup on the remote server and added to the "Allow Users" directive in '''/etc/ssh/sshd_config''' if you use that like we do to restrict only the specified users to have shell access to the server.
  
== Simple workstation backup ==
+
Also, the script can't enter passwords, so an RSA key-pair needs to be created with '''ssh-keygen -t RSA'''. The private part goes into '''/root/.ssh/''' (or the ''.ssh'' directory of whatever user will be running the script - we run it from the ''crontab'' as ''root''). The public part of the key-pair goes into '''/home/scp/authorized_keys''' on the remote server.
The following [[backup-workstation.pl]] Perl script is a simple backup solution for a local workstation which compresses and 7-zip's a list of directory trees.
 
{{code|<perl>{{:backup-workstation.pl}}</perl>}}
 
[[Category:Backup]]
 
  
== Simple server backup ==
+
One more security precaution is to lock the ''scp'' user down so that it can't be used for anything else except transferring files from the backup server into the '''/home/scp''' directory. To do this, we restrict the command that the user can execute, and turn off all the other SSH services that are usually available to applications using the SSH protocol. To do this prepend the following before the RSA public key in /home/scp/authorized_keys'''.
The following [[backup.pl]] Perl script is a simple backup solution to backup both a directory tree and all databases in a local MySQL server, and save them to a ''7-zip'' compressed file named according to the date. It also logs the backup and it's file size in [[Server log]] using [[MW:Extension:Simple Forms|SimpleForms]]. It can be executed periodically from the crontab.
+
{{code|<pre>command="scp -t /home/scp",no-port-forwarding,no-pty,no-agent-forwarding,no-X11-forwarding</pre>}}
{{code|<perl>{{:backup.pl}}</perl>}}
 
  
== MySQL Replication ==
 
MySQL 5.0 offers a new thing called [http://dev.mysql.com/doc/refman/5.0/en/replication-implementation.html replication] which allows many ''slave'' databases to be kept syncronised with a single ''master''. Even running the slaves locally is of benefit because regular backup and distribution can be made from the slaves so that the master never needs to be locked.
 
 
== Backup related news items ==
 
*[[20 November 2006]] ''- regarding changeover to LZMA (.t7z)''
 
*[[23 November 2006]] ''- 7z server backups which can be unpacked the same as our template in [[Debian Conversion]]''
 
*[{{fullurl:20 December 2006|oldid=44019}} 20 December 2006] ''- backup corruption in transfer''
 
  
 
== About LZMA ==
 
== About LZMA ==
Line 44: Line 39:
 
</table>
 
</table>
  
== Simple snapshot backup ==
+
=== Simple snapshot backup ===
 
If when a file is added to a compressed archive the size only increased based on the differences between the added items and the patterns within the existing content, then storing regular snapshots of entire folder structures in a compressed archive would be very efficient in terms of space. To this end I've made some test archives from two files '''1.sql''' and '''2.sql''' which contain a great deal of common content both within themselves and amongst each other. Here's a table of the sizes of the files in MB under various circumstances:
 
If when a file is added to a compressed archive the size only increased based on the differences between the added items and the patterns within the existing content, then storing regular snapshots of entire folder structures in a compressed archive would be very efficient in terms of space. To this end I've made some test archives from two files '''1.sql''' and '''2.sql''' which contain a great deal of common content both within themselves and amongst each other. Here's a table of the sizes of the files in MB under various circumstances:
 
{|
 
{|
Line 61: Line 56:
  
 
== See also ==
 
== See also ==
*[[OD/Wikia]]
 
*[[OD2]]
 
*[[Templates]]
 
 
*[http://download.wikimedia.org/enwiki Wikipedia backups]
 
*[http://download.wikimedia.org/enwiki Wikipedia backups]
 
*[http://www.perihel.at/3/index.html Good discussion on Linux and backing up with rsync]
 
*[http://www.perihel.at/3/index.html Good discussion on Linux and backing up with rsync]
 
*[http://blogs.techrepublic.com.com/10things/?p=895 10 Linux backup tools]
 
*[http://blogs.techrepublic.com.com/10things/?p=895 10 Linux backup tools]
 
*[[Sky]] ''- EMP-proof and water-proof 2010 backup''
 
*[[Sky]] ''- EMP-proof and water-proof 2010 backup''
[[Category:IT Support]][[Category:Nad]][[Category:Organic Design]]
+
[[Category:IT Support]]

Revision as of 15:53, 29 February 2012

The organic design backups are created daily and compressed to 7zip and distributed over Secure Copy Protocol (SCP) to various other domains.

Setting up automated backups over SCP

We use a general backup script to backup wikis on the various servers we administer which is in our Subversion tools repository here. The script dumps databases and compresses them to 7zip, then sends them over SCP to another domain if format can be done in Perl as follows:

{{{1}}}


This script connects to the remote server using a user called "scp" which needs to be setup on the remote server and added to the "Allow Users" directive in /etc/ssh/sshd_config if you use that like we do to restrict only the specified users to have shell access to the server.

Also, the script can't enter passwords, so an RSA key-pair needs to be created with ssh-keygen -t RSA. The private part goes into /root/.ssh/ (or the .ssh directory of whatever user will be running the script - we run it from the crontab as root). The public part of the key-pair goes into /home/scp/authorized_keys on the remote server.

One more security precaution is to lock the scp user down so that it can't be used for anything else except transferring files from the backup server into the /home/scp directory. To do this, we restrict the command that the user can execute, and turn off all the other SSH services that are usually available to applications using the SSH protocol. To do this prepend the following before the RSA public key in /home/scp/authorized_keys.

command="scp -t /home/scp",no-port-forwarding,no-pty,no-agent-forwarding,no-X11-forwarding


About LZMA

LZMA is an extremely good compression method which compresses our backups to about one third of the size of the gzip or bzip. I have tested it with the free 7-zip file manager from www.7-zip.org and od-wiki-db-2006-11-20 is 268MB uncompressed, 54.9MB gzipped and only 21.7MB as a 7z! But I'm unable to get the Debian port to work due to dependency issues with low level C libraries that I don't want to mess with.

  • I've found a standalone version at http://sourceforge.net/projects/p7zip and that's compressed it to 24.8MB, not quite as small as the windows one, but still very good.
  • Using switches -t7z -m0=lzma -mx=9 has got it down to 21.1MB - slightly smaller than the windows version :-)

Statistics

7Zip is extremely good at compressing wiki data compared to other algorithms, perhaps due to compressing the history more efficiently, here's a size comparison for compressing a server image which is a standard linux file structure containing no database or web site content.

Compressionserver imagewiki backup
none517MB269MB
7z122MB (76%)21.1MB (92%)
RAR140MB (72%)24.9MB (90%)
Bzip2176MB (66%)38.0MB (86%)
Gzip197MB (62%)54.5MB (80%)
Zip197MB (62%)54.5MB (80%)

Simple snapshot backup

If when a file is added to a compressed archive the size only increased based on the differences between the added items and the patterns within the existing content, then storing regular snapshots of entire folder structures in a compressed archive would be very efficient in terms of space. To this end I've made some test archives from two files 1.sql and 2.sql which contain a great deal of common content both within themselves and amongst each other. Here's a table of the sizes of the files in MB under various circumstances:

bz2
Type!1 2 1+2
887 1100
gz
126 163
7z 34 45 79

See also