Disaster Recovery Scheme
Disaster Recovery Scheme
The biggest error made by computer users is ignoring the importance of backing up their systems. Now for home personal systems where there is no real user data or the system is not used as the major vehicle to interface with the Internet, then a backup of your hard drive may not be necessary. But if you have a desktop installed or more than a few ports installed ‘IT IS NECESSARY’ to make a backup of your customized system as reinstalling everything from scratch is much more of a effort than just restoring the backup of your system. Professional users have an even more pressing need for back ups as they are generally required to provide 7/24 available which means they do not have time to reinstall and configure from scratch.
The second biggest error made by computer users is not testing the restore process before it’s required in an emergence caused my hard drive failure or security compromise of the system or just plain stupid administration operator errors. Having a pre-tested documented restore process is essential in a professional 7/24 available environment. Nothing is worse then trying to use your backup files and finding out they are un-usable or incomplete. This is why a Disaster Recovery simulation is so necessary.
Two Categories of Backups
System backups mean the file systems where the operating system runs from. The basic system runs out of the / file system and uses the /var file system. The /usr file system is where all the ports are run from and installed to as well as any system sources if installed.
User Data Backups
User data backups means backing up the data created by the users. Normally the directory /usr/home/user is the location for desktop application created user data files such as spreadsheet files or text files or document files. The location of Database data may also be a directory under /usr. This is ok on systems with only a few users and databases used for developing applications or only containing small amounts of data. Since ‘dump’ only works on ‘file systems’, dumping the /usr file system will back them up.
Backing up user data separate from the system has it logical benefits. In environments containing a large number of active users and production databases it’s desirable to allocate separate file systems for /usr/home and /myDataBase on the same hard drive or on a separate hard drive depending on their size. Backing up Database data with the dump command should not be the only mode of database backup. All database software has it’s own dump and restore functions and should be used separately and in addition to the dump command generated backups.
The dump command is designed for writing an complete image of non-mounted file systems. Dump cannot select a directory tree or individual files to dump. Non-mounted file systems means ‘currently not in use’. When you are running in MUM (multi-user-mode) there are services and other background processes (IE: sendmail or postfix, cron, DHCP, apache, mysql.) changing the content of the files in the live file system being dumped. Trying to dump the file systems that your system is running from may result in dumping files that are changing in flight thus causing the dump file to contain "time skew" corruption making it un-usable for restore. There are 2 ways to eliminate this problem.
In MUM (multiple-user-mode) which is the normal way FreeBSD runs, you give the ‘dump command’ the –L flag. This tells dump that its going to be reading a live file system and to first take a snapshot of the file system in the .snap directory of that file system. The point of the snapshot is to give a point-in-time consistent view of the filesystem.
The alternate method is by booting your system in SUM (single-user-mode). In SUM your system is isolated from the network, all file systems are un-mounted except / which is in read only mode and there is only one active process in your system, the root shell you are using to dump and or restore the file systems. In this environment you leave off the –L flag from the dump command.
To enter SUM: During the normal boot process, it pauses for 10 seconds at the FreeBSD menu. At this pause select option 4 for single user mode: At the prompt for shell path, just hit enter on keyboard. When the system comes up you are in SUM, all file systems are un-mounted except / which is mounted read only.
Then issue the dump command writing the dump of each individual file system out to another motherboard cabled hard drive or USB cabled hard drive or USB memory stick, or to old fashion tape drive.
Pros and Cons of SUM verse MUM for dump processing MUM Pros: Can target remote hosts as depository of the backup file directly while the dump is running. No schedule problem of when to run the dump. Cons: The –L flag will cause dump to hesitate while the snapshot is taken. The length of this hesitation depends on what services and or applications are running on the file system being dumped. These services and or applications may also pause or stop momentarily. SUM Pros: Minimally shorter elapse running time because no snapshot taken. Cons: On busy production systems may not be able to schedule reboot time to do dump. Cannot target remote hosts as depository of the backup file directly while the dump is running.
There is no technical reason for selecting to do SUM dumps verse MUM –L dumps. In both cases the resulting dump files are equivalent. Writing SUM dump files to a local host backup file system and when MUM is resumed transferring them to a remote host depository duplicates the function MUM –L has of being able to target remote hosts as depository of the backup file directly while the dump is running. Dumping and restoring requires a (Driver) running system. It does not matter if the (Driver system) is the system booted from the motherboard cabled hard drive, USB external cabled hard drive or USB memory stick containing a full installed system or booted from CD disk containing the livefs system. The livefs is burned to CD from the livefs.iso.
How often to dump
You can do full dumps and dumps of just what has changed since the last time a file was dumped. These are called full dumps and change dumps. The documentation refers to them as level 0 for full dump and level 1-9 as the change dumps. The man pages give a complicated scheme for managing full and change dumps. Probably most people really need only a level 0 and a level 1, maybe a level 2.
Basically the point of the change dumps is to make smaller backup files, which takes less time and less media. You only make the full dump (level 0) once every week or every month - whatever you’re needs are. Then in between you only dump the files that have changed since the last full dump. If that change dump file gets too big as well, then you jump to the next level on change dump. So, you do a level 0, then, the next day a level 1. If it is small (meaning only a relatively few files have changed) then the third day you still make a level 1. If the level 1 dump is now real big (meaning a lot of files changed) then on day 4 you go to a level 2 dump, etc. It is probably a good idea to regularize the process of choosing levels. That is why the man page has such a complicated scheme that covers all conditions. But most people with a personal or office/department level server often need only a regular full (level 0) dump, plus a daily level 1 dump in between the full dumps. In fact, many servers are small enough that a daily level 0 dump is all that is needed.
Now, if you have a big system with lots of new files and changed files all the time, then you will have to organize your dumps in a more sophisticated manner. Generally, level 0 dumps take whatever amount of media they need to contain the whole filesystem. Then, for the change dumps (level 1..9) you hope to keep then to only one unit of media. If a change dump goes over one unit of media, then you move up a level the next time. The same goes for if the change dump starts to take a lot of extra time.
Where To Write The Dump
Your backup dumps are the single most critical item in your Disaster Recovery Scheme. Your level 1...9 change dumps must be associated with their full dump level0 companion. Now in a large data environment having massive user data changed it might take two 3gb tapes to hold the full level0 dump and a single 3gb tape partly full for each daily level1 dump. A naming convention would be used in labeling the tapes to associate the dump0 tapes with it's dump1 tapes. When using a remote or local large hard drive for storage of your dumps a suitable naming convention would also suffice. The problem with using a central depository for all your dump files is when that device fails all your backups go with it.
How Many Backup Copies Do You Keep
In a professional Disaster Recovery Scheme environment separate media would be used to contain the files in a son, father, grandfather rotation. For example, The production system is froze, meaning no new ports are being installed. The system (IE /, /var. /usr) gets a level0 dump every Sunday night. Each Sunday nights dump is written to a 4GB USB stick. The first Sunday this is the son backup. The second Sunday, the same level0 dump is written to a new USB stick as the son and the previous son becomes the father. On the third Sunday, The same level0 dump is written to a new USB stick as the son and the previous son becomes the father and the previous father becomes the grandfather. On the fourth Sunday, The previous grandfather is written as the new son and each one bumps over one in rotation. The advantage to this rotation is upon restoring you find the son USB stick has a hardware problem or what you are trying to recover is not on that version of the backup you always have the father and grandfather version as safe guards.
Another example. Using the same frozen system we have /usr/home and /database as a separate file system to segregate the user data from the system data. The system hosts hundreds of login accounts for a college computer programming students. We still do the Sunday night Full level0 dump to the son 32GB USB stick and an six nightly level1 dumps to the same son USB stick. We rotate through the 3 USB sticks as for the system dumps the only difference is the USB stick contains the dump0 and it's companion change level1 change dumps.
Modify this rotation concept to fit you particular needs.
Dump To Remote Host
Dump doesn't cross file systems. In a typical FreeBSD install, /, /var, and /usr are separate file systems. A dump of / won't get them all at once. You have to issue dump command for each individual file systems, IE: /, /tmp. /var, and /usr. If you have a configuration where everything goes into a big /, dumping it will get all data from that partition. Slices and MBR are out of dump's scope.
Here is sample commands to dump from a host running in MUM (multiple-user-mode) and storing the dump file on a remote host. To make the identification of the dumps easier the first sample commands make copies of the partition label, the last boot messages, the MBR (Master Boot Record), and the fstab file. Port 777 on localhost is a pipe to the remote host with a 800 GB hard drive.
bsdlabel ad4s1 | ssh -p 777 tom@remotehost dd of=/bkup/bsdlabel_ad4s1 dmesg -a | ssh -p 777 tom@remotehost dd of=/bkup/dmesg dd if=/dev/ad4 count=1 | ssh -p 777 tom@remotehost dd of=/bkup/MBR cat /etc/fstab | ssh -p 777 tom@remotehost dd of=/bkup/fstab dump -0Lan -f - / | gzip | ssh -p 777 tom@remotehost dd of=/bkup/dump0-root.gz dump -0Lan -f - /tmp | gzip | ssh -p 777 tom@remotehost dd of=/bkup/dump0-tmp.gz dump -0Lan -f - /var | gzip | ssh -p 777 tom@remotehost dd of=/bkup/dump0-var.gz dump -0Lan -f - /usr | gzip | ssh -p 777 tom@remotehost dd of=/bkup/dump0-usr.gz
The following script (fbsd2dump) will use a 1GB USB stick as depository for the dump files. A standard FreeBSD install with postfix, mysql, apache, php only consumes 600MG in dump format and only took 13 minutes to dump.
#!/bin/sh # This script will use dump command to backup your running # system to a USB flash stick. # This is run as root. # # Change these device unit pre-fixs in the code below as needed # ad0 is the live file system # da0 is the target # # Comment or uncomment the 4 dump statements depending on # whether you want to compress the saved dump file. # # Be sure to unplug your USB stick and re-plug it in to # mount it again. echo " " echo " " echo "Starting time for this live file system dump is" date echo " " echo " " echo "Prepare the target" dd if=/dev/zero of=/dev/da0 count=4 fdisk -BI /dev/da0 bsdlabel -B -w da0s1 newfs -U /dev/da0s1a mount /dev/da0s1a /mnt cd /mnt echo " " echo " " echo "Post the dump date" date > date.of.dump cat date.of.dump echo " " echo " " echo "Post the MBR" dd if=/dev/ad0 count=1 > MBR #dd if=/dev/ad0 count=1 | od -c echo " " echo " " echo "Collect live Slice Partition sizes and save" bsdlabel ad0s1 > liveSPsizes cat liveSPsizes echo " " echo " " echo "Collect live file system sizes and save" df -h > liveFSsizes cat liveFSsizes echo " " echo " " echo "Post the fstab" cp /etc/fstab > fstab cat /etc/fstab echo " " echo " " echo "Post the script used to create this dump" cp /root/bin/fbsd2backup > fbsd2backup echo " " echo " " echo "Post the script used to restore this dump" cp /root/bin/fbsd2restore > fbsd2restore echo " " echo " " echo "Dump file system 'a' / " dump -0Lauf dump0-root /dev/ad0s1a #dump -0Lauf dump0-root /dev/ad0s1a | gzip echo " " echo " " echo "Dump file system 'd' /var " dump -0Lauf dump0-var /dev/ad0s1d #dump -0Lauf dump0-var /dev/ad0s1d | gzip echo " " echo " " echo "Dump file system 'e' /tmp " dump -0Lauf dump0-tmp /dev/ad0s1e #dump -0Lauf dump0-tmp /dev/ad0s1e | gzip echo " " echo " " echo "Dump file system 'f' /usr " dump -0Lauf dump0-usr /dev/ad0s1f #dump -0Lauf dump0-usr /dev/ad0s1f | gzip sync cd /root umount /mnt echo " " echo " " echo "Ending time for this live file system dump is" date echo " " echo " " echo "Script completed" echo " " echo " " echo "Unplug your USB stick NOW"
Disaster Recovery Restore Script
The following script (fbsd2restore) is the companion restore script using the dump files on the USB stick created by the above fbsd2dump script.
#!/bin/sh # This script will use restore command to restore your running # system from a USB flash stick containing the dumped file systems. # This is run as root. # # Change these device unit pre-fixes in the code below as needed # ad0 is the target system # da0 is the USB flash stick # # This script assumes that the target of the restore is of same # size or larger as the hard drive the saved dump file were # created from and only has a single slice allocating all the space. # # Secondly, That the currently running Freebsd system was booted # from a fixit cd or a USB cabled external hard drive or USB stick. # IE: The target hard drive is not in use. # # Instructions to use when booting from fixit cd. # use df -h to show you the mounted file systems # Boot from fixit cd. Hit enter to accept English documentation. # From the sysinstall main menu select fixit and then option 1 CD. # You will get fixit command line prompt. enter this # mount /dev/da0s1a /mnt # mount the USB stick holding the dump files # cd /mnt # change into mount point # cp fbsd2restore /bin/ # copy the restore script to location in path # chmod 760 /bin/fbsd2restore # give script exec permissions # fbsd2restore # exec the script # when script finishes in about 13 minutes enter exit to return to sysinstall. # Exit sysinstall and system reboots. Remove your fixit cd from the drive. # echo " " echo " " echo "Starting time for this file system restore is" date # make target mount point cd / mkdir -v target echo " " echo " " echo "Prepare the target" dd if=/mnt/MBR of=/dev/ad0 count=1 bsdlabel -B -w ad0s1 # You may want to edit the liveSPsizes file to change the # file system allocation sizes. # ee /mnt/liveSPsizes bsdlabel -R ad0s1 /mnt/liveSPsizes newfs -U /dev/ad0s1a newfs -U /dev/ad0s1d newfs -U /dev/ad0s1e newfs -U /dev/ad0s1f # The restore flag -N means # Do the extraction normally, but do not actually write any changes # to disk. This can be used to check the integrity of dump media # or for Disaster Recovery simulation purposes. # # Note the -N flag IS used here and must be removed to run live echo " " echo " " echo "Restore file system 'a' / " mount -v /dev/ad0s1a /target cd /target restore -rvf /mnt/dump0-root #restore -rNf /mnt/dump0-root #gzip /mnt/dump0-root | restore -rf- echo " " echo " " echo "Restore file system 'd' /var " mount -v /dev/ad0s1d /target/var cd /target/var restore -rvf /mnt/dump0-var #restore -rNf /mnt/dump0-var #gzip /mnt/dump0-var | restore -rf- echo " " echo " " echo "Restore file system 'e' /tmp " mount -v /dev/ad0s1e /target/tmp cd /target/tmp restore -rvf /mnt/dump0-tmp #restore -rNf /mnt/dump0-tmp #gzip /mnt/dump0-tmp | restore -rf- echo " " echo " " echo "Restore file system 'f' /usr " mount -v /dev/ad0s1f /target/usr cd /target/usr restore -rvf /mnt/dump0-usr #restore -rNf /mnt/dump0-usr #gzip /mnt/dump0-usr | restore -rf- sync cd / # Note that restore leaves a file restoresymtable in the root # directory to pass information between incremental restore passes. # This file should be removed when the last incremental has been restored. rm -v /target/usr/restoresymtable rm -v /target/tmp/restoresymtable rm -v /target/var/restoresymtable rm -v /target/restoresymtable umount -v /target/usr umount -v /target/tmp umount -v /target/var umount -v /target rmdir -v target echo " " echo " " echo "Ending time for this live file system dump is" date echo " " echo " " echo "Script completed"