babylon4, my main server (a dual-Xeon box with, currently, 4GB of RAM and a dozen 300GB SATA disks on two SAS controllers, running Solaris 10) just began randomly crashing for no apparent reason whatsoever. Nothing logged, no kernel panic onscreen, just ... one moment running totally normally, and the next, dead in the water with no video and not responding on the network. Power supplies show green throughout. I'm running memtest86+ on it right now; it's gone through one pass with no errors, and is ... currently 28% into pass 2.
This wouldn't be quite so annoying if it would at least give me some vague clue about what's the problem.
I've taken the opportunity to pull the one failed disk out. I'd like to get the cover off and check the CPU heatsink fans, but I need to rearrange the rack a bit to be able to do that. (UPS down 1U or even 2U, move firewall on top of the UPS, would do it.) I should try to get that done today.
no subject
no subject
The other side of the picture is that in addition all the services it provides (including NAS), this is also my Solaris 10/ZFS learning box. It's kind of hard to learn all of the features of ZFS without a big array to learn it on. (For the record, being made up of aging drives of dubious reliability because I can't afford to replace them at the moment, it's configured as RAIDZ2 with originally two hot-spares, now down to one because one disk failed for good a week or so ago. Old-school screaming disk crash.)