For now, I'm just going to say that most of logical-today (which is to say, Thursday) has sucked wet farts out of dead pigeons. And leave it at that. I may, or may not, enlarge upon this logical-tomorrow.
January 19th, 2007
So, let's see: At the last point mentioned yesterday, about 1030, I'd just brought babylon5 back up after rather abruptly halting it by fumbling a power connector while hot-swapping in a SCSI-III disk as a replacement for one of the two mirrored boot disks, which was showing signs of impending spindle failure. During the course of associated troubleshooting, it appeared the remaining CDROM drive in the machine had also failed.
Well, the good news is that once ALL the shuffling of SCSI devices was done with, it turned out the CDROM was still good. In fact, what was going on was that the internal SCSI Zip drive on the same SCSI bus, always troublesome, had failed — and failed badly enough that it was screwing up both the Plextor CDROM and the Exabyte VXA1 tape drive that it shared the second SCSI controller with. So I don't need to replace the CDROM. (Not that it isn't a little tempting to get a CD/DVD writer into the machine.)
The bad news is that about 15 minutes after I started dd'ing data from the failing mirror disk (/dev/sda, aka /dev/sd/c0b0t0u0, aka /dev/scsi/host0/bus0/target0/lun0/disc) to the new disk (/dev/sdc, aka ... well, you get the idea), the system locked up for no apparent reason whatsoever. In doing so, it apparently badly hosed the RAID superblock on /dev/sda1 ... and this caused ALL KINDS of trouble, especially since I don't have a bootable live-filesystem Linux CD around that's recent enough to grok ext3, let alone devfs. I ended up having to recover the superblock on the partition, manually unmirror the disks, fix a couple of different issues on the disks, and eventually get the disks into good enough shape using a Slackware 7 live CD that I was able to boot the machine off its own disks again somewhere around 1900. I still had a bunch of work to do after that before everything was working properly. I compiled a new 2.4.34 kernel while I was at it, having discovered that one of the bizarre problems I'd been fighting is a known problem: Sometimes, a Linux kernel compiled with both XFS support and metadevice support, using ext3 filesystems, will mistakenly try to mount ext3 filesystems on md devices as XFS filesystems, fail to find the necessary structures, and die from a kernel panic. It appears at present the only way to avoid this problem is "Don't use ext3 filesystems on metadevices if you have XFS support in your kernel". So when I installed the new kernel, I deleted both XFS and reiserfs support, which made for a significantly smaller kernel. (A while back, I'd intended to do some performance comparisons between reiserfs, XFS and JFS, with a vague intention of switching to either XFS or JFS, and never got around to doing it. I should probably have removed the JFS support as well while I was at it.)
In the meantime, someone had managed to overflow the toilet in the downstairs bathroom, flooding the bathroom and hallway (both tiled, fortunately, but there was plenty of stuff to get wet including the mat that's about an inch or two in each direction too big for said hallway). This necessitated getting out the steam vac to clean up the flood, from which I learned that it's below freezing up in the attic (we hadn't emptied the wastewater tank last time we used it, and it was frozen solid). I now also need to obtain a replacement for the short piece of hose that connects the back of the machine to the main body, because it had apparently hardened from age and it broke while I was vacuuming up the water. (Fortunately that only affects the hand tools. Unfortunately, I don't know whether I can actually get the replacement hose, or whether I can replace it without tearing apart the entire machine to do it.)
That's not the sum of the ways in which yesterday sucked. But it's quite enough to be going on with.