Recently I had a RAID5 array crash on me. The array was composed of 3 Western Digital 250GB disks controlled by a 3Ware 9550SX card. This array had been in continuous operation for nearly 4 years. Yet, about 12 days ago one of the drives appeared to have crashed. As luck would have it, though, the PSU was also failing in this box, so the +5V line stopped working and took another drive offline. That was the end of the array.
3Ware/LSI was a great help. They created a custom application that was able to recover the original RAID header information. After attaching a new PSU to the box, 2 of 3 drives were online and the LSI tool made the array online too (but degraded).
That was zero day and I was still hopeful. I downloaded R-Tools linux recovery application and created a rescue CD. Stuck the CDROM into the failed system and started the recovery process. After about 4 days, R-Tools consumed the entirety of a 500GB disk that I had attached to the system and it was not done. So I gave up on R-Tools and tried Disk Patch, but that couldn't even recognize the drive array (no driver for the 3ware card). Then I found a forensic tool from Italy called CAINE. CAINE had Test Disk built into its operation environment, which was able to recognize the partition information on the LVM volumes and was able to rewrite it successfully. But still, nothing could mount the file system.
So I downloaded Phoenix Linux Recovery. It has a fun interface and looks nice and pretty, but it did not discover any of the LVM volumes. I tried their quick scan and their deep scan. It wasn't until a couple days of support interaction that I was told it does not support LVM volumes.
I went back to R-Tools and gave it a regular expression to match on the file names that I needed. Of the 300MB of files that were all named using the same method (32-character hash code), it found 1. That scan tool another 3 days.
Nearly at the point of giving up, I installed CentOS 5 on that 500GB spare drive, attached it to the motherboard, and changed the bios to give it boot priority (higher up on the boot device list, above the 3Ware card). With CentOS installed, i was able to run LVM and get the list of volumes on the drive array and see that its partition information was intact. So I ran e2fsck with the "-y" option on the array's volume and waited. Then I ran e2fsck about 4 more times before it finally was done fixing bad inode references and such.
Now I was able to mount the root file system, but all of the files were in "lost+found." So I did a "du" on the directory to see where I was at and spotted my original directory structure during the du process output. Control-C, change du to a du with a pipe through grep, and I found my files! Then tar, gzip, and scp, and the files were safely tucked away on more secure hardware.
I paid probably $700 for the various software products that all failed to anything useful. The two tools that worked for me were CAINE and e2fsck, both of which are FREE. Quotes from Kroll-OnTrack had the recovery cost between $3000 and $10,000. Every service wanted an upfront $300 fee to diagnose the RAID array.
Using LVM to partition your array makes future recovery from a crash more difficult. Make sure that you attach the crashed array to a new install of your original OS type and try to discover the extent of your damage. e2fsck can run in non-volatile mode, which means it will report the errors on your volume, but will not make any changes. In the end, using the "-y" option will allow you to sit back and watch the magic.
3Ware/LSI was a great help. They created a custom application that was able to recover the original RAID header information. After attaching a new PSU to the box, 2 of 3 drives were online and the LSI tool made the array online too (but degraded).
That was zero day and I was still hopeful. I downloaded R-Tools linux recovery application and created a rescue CD. Stuck the CDROM into the failed system and started the recovery process. After about 4 days, R-Tools consumed the entirety of a 500GB disk that I had attached to the system and it was not done. So I gave up on R-Tools and tried Disk Patch, but that couldn't even recognize the drive array (no driver for the 3ware card). Then I found a forensic tool from Italy called CAINE. CAINE had Test Disk built into its operation environment, which was able to recognize the partition information on the LVM volumes and was able to rewrite it successfully. But still, nothing could mount the file system.
So I downloaded Phoenix Linux Recovery. It has a fun interface and looks nice and pretty, but it did not discover any of the LVM volumes. I tried their quick scan and their deep scan. It wasn't until a couple days of support interaction that I was told it does not support LVM volumes.
I went back to R-Tools and gave it a regular expression to match on the file names that I needed. Of the 300MB of files that were all named using the same method (32-character hash code), it found 1. That scan tool another 3 days.
Nearly at the point of giving up, I installed CentOS 5 on that 500GB spare drive, attached it to the motherboard, and changed the bios to give it boot priority (higher up on the boot device list, above the 3Ware card). With CentOS installed, i was able to run LVM and get the list of volumes on the drive array and see that its partition information was intact. So I ran e2fsck with the "-y" option on the array's volume and waited. Then I ran e2fsck about 4 more times before it finally was done fixing bad inode references and such.
Now I was able to mount the root file system, but all of the files were in "lost+found." So I did a "du" on the directory to see where I was at and spotted my original directory structure during the du process output. Control-C, change du to a du with a pipe through grep, and I found my files! Then tar, gzip, and scp, and the files were safely tucked away on more secure hardware.
I paid probably $700 for the various software products that all failed to anything useful. The two tools that worked for me were CAINE and e2fsck, both of which are FREE. Quotes from Kroll-OnTrack had the recovery cost between $3000 and $10,000. Every service wanted an upfront $300 fee to diagnose the RAID array.
Using LVM to partition your array makes future recovery from a crash more difficult. Make sure that you attach the crashed array to a new install of your original OS type and try to discover the extent of your damage. e2fsck can run in non-volatile mode, which means it will report the errors on your volume, but will not make any changes. In the end, using the "-y" option will allow you to sit back and watch the magic.