So the other day I had suffered probably the third dataloss in a year due to the XFS filesystem in use on my desktop + power loss.
Update: I just realized that I failed to mention that the dataloss is my fault, not XFS’s. Had I done more diligence in my research when I set it up in the first place I would have been aware of that particular shortcoming (shared with ext4 in its 2.6.29 and earlier configuration) where metadata updating before data, merged with powerloss could result in zero-length files. The files I lost were always ones being written to at the time, never just “random” files. Anyways…
I had already setup fairly well I thought, given that I was using RAID 1 (i.e. two redundant hard drives to hold one hard drive of data) and had UPS protection on the computer anyways.
The failures in my setup were notable though:
- My backup strategy depended on me having enough time to actually hook up the external drive, run the backup, unhook it, etc. Scripting it so that it would take less time still required too much time to initially setup. Therefore my backups were fairly out of date. Compounding this, when I finally did setup the external and try to perform a backup, I found to my displeasure that it was ntfs formatted and therefore I couldn’t write to it. (I had the read/write ntfs-3g driver available but had forgotten about it by then, and didn’t have time to troubleshoot further).
- UPS only works when your 2 year old son doesn’t go hard resetting the power on you.
- UPS + disabling the front power switch only saves you until your next kernel panic induced by bleeding-edge video drivers (the fact that they were open source didn’t save me here ;)
- The RAID worked — both hard drives were in the same inconsistent state, with the same lost files after the fsck… :(
The corrective action for all of this is complete however. Last night I used ntfsresize on the external drive (since it had data I wanted to retain) and used the free space to make a new ext3 partition for backups. From there I booted up a Gentoo LiveCD and copied all of the hard drive’s data to the external drive. I then reformatted the hard drives with ext4 (with appropriate mount options to avoid having it cause the same issues I have now with xfs) and copied all the data back.
There were some hiccups relating to me using manual mdadm commands to bring the hard drives online from the LiveCD. I apparently changed some of the md device names on the hard drives in the process which required some more manual mdadm –assemble action to fix. But everything is working so far, although I’m sure there’s a few packages that need to be reinstalled to account for the dataloss over time.
Luckily the major reason for my lack of free time looks like it will be rectified within the next week, more on that later.