unixronin | Murphy, instantiated

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

A ZFS RAIDZ2 pool without hot spares can survive simultaneous failure of up to two devices and still continue operating in degraded mode.

So, naturally, between when we left for the elementary school this morning at 0755 and when we got back home at about 1015, apparently THREE of the twelve 300GB SATA disks in babylon4‘s main storage pool failed.

I suppose that’s what I get for trusting Maxtor disks. But they were free-to-me, so I can’t complain too loudly. Unfortunately, I have no spare disks, and can’t spare the money right now to replace them with better (not to mention new) disks, or I would have already done so.

Equally annoyingly, I hadn’t gotten backup migration from disk to tape set up yet. I’m more annoyed at the configuration work - principally Apache2 - that I’ll now have to redo than the little data that I’ve lost, which principally consists of a couple of minor edits to our recipe book, a dozen or so ISOs that I can re-download any time I want to, and a dozen or so source code tarballs.

To add insult to injury, I’d been going to work on tape migration next, after the full backup that was scheduled to run this morning completed....

Update: As noted in the comments below, a network-wide Full backup ran last night, starting at 03:10 and putting heavy, sustained load on the array. By the look of things, an already-weak disk folded under the load at 04:29:55, increasing the load on the remaining disks and setting up a cascade failure over about the next four and a half hours. The second-weakest disk succumbed to the increased load at 08:49:29, increasing the load on the remaining disks still further, then just under eight minutes later at 08:57:06, the third-weakest disk followed the first two and the entire array went down.

Flat | Top-Level Comments Only

no subject

radarrider

Monday, June 1st, 2009 11:51 pm (UTC)

Three HDDs failing within a time frame of 140 minutes seems to be a rather low-probability occurrence. Could there be an environmental factor? Or is it possible that the failure of one somehow triggered the failure of the others, which would indicate a possible design flaw in the RAID controller?

unixronin

Tuesday, June 2nd, 2009 01:20 am (UTC)

It bothers me too.

I've been studying the logs and found some more clues in the kernel log .... it looks like c1t7d0 went down at 04:29:55, then c1t6d0 croaked at 08:49:29, which put the array into degraded mode, then at 08:57:06 c1t4d0 failed and the array went down. It's probably significant that a full backup of everything on the network, including this array, to another filesystem on the array began at 0310, so I'm guessing that those three drives were shaky to start with and just folded one after the other as the load increased.

databeast.livejournal.com

Monday, June 1st, 2009 07:56 pm (UTC)

Murphy's Law of Backups

- Once you plan to do backups, that's when your hard drives will instinctively fail.

Honestly, the amount of times I've had catastrophic data-death a few days before I was going to get around to doing backups is getting too long to remember them all.

So now, I never even think about doing backups, so as not to jinx it.

lebo-superman.livejournal.com

Monday, June 1st, 2009 08:03 pm (UTC)

Ouch! These posts are reminding me that I'm pushing my luck with my little desktop. I haven't done a backup on it in over 4 years; well, since I got it.

Monday, June 1st, 2009 08:28 pm (UTC)

I went to BestBuy.com and checked - I can get a Seagate 1T internal HDD for $119, or a Western Digital 1T internal HDD for $149. I've read some bad things about the Seagate. What do you think? Is the extra money for the Western Digital worth spending?

unixronin.livejournal.com

Monday, June 1st, 2009 09:02 pm (UTC)

Honestly, I would trust Seagate over Western Digital, as long as you avoid the recent run of buggy firmware on the 500GB and 1TB drives. Look for one of the Seagate enterprise models with the five-year warranty. They'll probably cost you $10-$20 more than the consumer models, but that's still less than the WD disk that to my best knowledge you can't get with anything over a one-year warranty.

Habemus plus vis computatoris quam Deus

Further ramblings of a Unix ronin

Profile

Links

December 2012

Navigation

Page Summary

Most Popular Tags

Expand Cut Tags

Murphy, instantiated

no subject

no subject

no subject

no subject

no subject

no subject

Style Credit