Profile

unixronin: Galen the technomage, from Babylon 5: Crusade (Default)
Unixronin

December 2012

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
3031     

Most Popular Tags

Expand Cut Tags

No cut tags
Monday, June 1st, 2009 02:40 pm

A ZFS RAIDZ2 pool without hot spares can survive simultaneous failure of up to two devices and still continue operating in degraded mode.

So, naturally, between when we left for the elementary school this morning at 0755 and when we got back home at about 1015, apparently THREE of the twelve 300GB SATA disks in babylon4‘s main storage pool failed.

I suppose that’s what I get for trusting Maxtor disks. But they were free-to-me, so I can’t complain too loudly. Unfortunately, I have no spare disks, and can’t spare the money right now to replace them with better (not to mention new) disks, or I would have already done so.

Equally annoyingly, I hadn’t gotten backup migration from disk to tape set up yet. I’m more annoyed at the configuration work - principally Apache2 - that I’ll now have to redo than the little data that I’ve lost, which principally consists of a couple of minor edits to our recipe book, a dozen or so ISOs that I can re-download any time I want to, and a dozen or so source code tarballs.

To add insult to injury, I’d been going to work on tape migration next, after the full backup that was scheduled to run this morning completed....

Update: As noted in the comments below, a network-wide Full backup ran last night, starting at 03:10 and putting heavy, sustained load on the array. By the look of things, an already-weak disk folded under the load at 04:29:55, increasing the load on the remaining disks and setting up a cascade failure over about the next four and a half hours. The second-weakest disk succumbed to the increased load at 08:49:29, increasing the load on the remaining disks still further, then just under eight minutes later at 08:57:06, the third-weakest disk followed the first two and the entire array went down.

Tags:
Monday, June 1st, 2009 11:51 pm (UTC)
Three HDDs failing within a time frame of 140 minutes seems to be a rather low-probability occurrence. Could there be an environmental factor? Or is it possible that the failure of one somehow triggered the failure of the others, which would indicate a possible design flaw in the RAID controller?
Monday, June 1st, 2009 07:56 pm (UTC)
Murphy's Law of Backups

- Once you plan to do backups, that's when your hard drives will instinctively fail.


Honestly, the amount of times I've had catastrophic data-death a few days before I was going to get around to doing backups is getting too long to remember them all.

So now, I never even think about doing backups, so as not to jinx it.
Monday, June 1st, 2009 08:03 pm (UTC)
Ouch! These posts are reminding me that I'm pushing my luck with my little desktop. I haven't done a backup on it in over 4 years; well, since I got it.
Monday, June 1st, 2009 08:28 pm (UTC)
I went to BestBuy.com and checked - I can get a Seagate 1T internal HDD for $119, or a Western Digital 1T internal HDD for $149. I've read some bad things about the Seagate. What do you think? Is the extra money for the Western Digital worth spending?
Monday, June 1st, 2009 09:02 pm (UTC)
Honestly, I would trust Seagate over Western Digital, as long as you avoid the recent run of buggy firmware on the 500GB and 1TB drives. Look for one of the Seagate enterprise models with the five-year warranty. They'll probably cost you $10-$20 more than the consumer models, but that's still less than the WD disk that to my best knowledge you can't get with anything over a one-year warranty.