Monday, September 1, 2008

RAID rebuilding

Well, this didn't go down quite as simply as I'd hoped... however, I do believe that the cause of the trouble was my own fault. As you can see in the screenshot in my previous post, it clearly states which drive has failed... however, that exact name is not present anywhere else that I saw... Disk Utility's RAID management section stated that "disk1s2" had failed... however, the RAID renames both drives to "RAID Slice" so you lose most of the differing characteristics, other than the fact that Disk Utility also tells you which drive bay each drive is in.... very great, invaluable feature. The problem, however, was that even in System Profiler, nothing referred to those drives in the precise way the RAID manager did... System Profiler referred to them as "disk1" and "disk2", which made me believe that "disk1" was the same as "disk1s2" since they're both "disk1". I imagine "s" maybe stands for "slice"... but both are listed as "s2" so I'm not entirely sure. I know some *nix, but not enough to get into RAID handling and that sort of thing.

In any regard, I determined that it was the drive in Bay 2 (disk1) that had failed. In fact, Disk Utility told me directly when I selected the drive that appeared in red in the left colum that it was the the bay 2 drive... but I of course wanted to be sure and didn't want to assume too much. So I proceeded to stick in the 500GB drive I received earlier so I could format it and copy the data from the RAID to it as a failsafe in case anything went wrong... but Disk Utility didn't seem to recognize the drive. So I rebooted the computer. Upon rebooting, Disk Utility began to automatically rebuild the array... odd, I thought a drive had failed.

Long story short, about 4-5 hours later it finished rebuilding and all seemed well. I rebooted a couple more times, copied data to/from the drive, ran a disk check, and everything seemed ok, as if the drive had magically fixed itself. However, I was still skeptical, since a drive that's "fine" shouldn't just fail like that... so I decided to replace the failed drive anyway and RMA it. I first decided to remove a drive 1 at a time and reboot to see if the computer would run ok with both. After the first run, it did. However, when I put both back in and rebooted, my dock had been reset to the default dock when you create a new account... but my data was all intact. I removed the drive from bay 3 again and rebooted... this time it failed to login. My home directory is located on the RAID, and OS X couldn't load my directory from that drive. So I shut down, put the drive back in and pulled out bay 2... started up, and the same thing occurred. I put in both, restarted, and it finally booted, but said the file system had failed and needed repairing, but I could operate in a degraded mode. I could use the computer, but not write any data to the RAID. I changed the location of my user directory to the 500GB drive, rebooted, and all seemed well- my desktop looked like it should.

So I opened Disk Utility and tried repartitioning the 2 RAID drives so I could rebuild the array... the drive in bay 3 reformatted just fine, but the drive in bay 2 could not reformat due to some file system error. I tried erasing, repartitioning, etc, and nothing worked. So I swapped it out with the new drive and rebuilt the RAID and am now copying data back to the RAID so I can change the location of my account login folder and be back up and running on the RAID as normal.

It seems the drive was in fact still acting up, so it will be replaced... I'm sure my constant rebooting and pulling out drives when it was probably trying to auto rebuild the array is what caused the further corruption, but it also helped me pinpoint for absolute certain which drive was acting up, so I'm glad for that regard- I'm also very glad I copied the drive to another location before attempting any of this, or there's a big chance I may have lost it all... NOT something I want to happen.

Rebuilding the RAID and getting it back up has been a piece of cake, however. Once this is all done, I still think I'll replace my boot drive with the 500GB one I got, as it has a much larger cache and I think my system will benefit from that. I may wait until the weekend or something, though.

No comments: