Last night, vorlon, my Athlon64/Win2K gamebox, did something weird. I went to log off it and found the screen was blank grey, and wouldn't clear. Assumign that it had locked up somehow, I power-cycled it, left it to reboot, and didn't think anything more of it until this morning ... when it still had a blank grey screen. I tried again to reboot it, and it failed to boot.
"Houston," I thought, "we have a problem."
My first thought was that the hard disk had failed. Over the next hour and a half or so, in between thoroughly vacuuming everything in sight, I tested various hardware subsystems as best I could on a non-booting machine. I determined that the disk was OK as far as I could tell, and gradually narrowed the problem down to either the motherboard or the RAM, so I started testing the RAM one bank at a time. Unplugging bank 1, I booted the machine on just bank 0 ... and it booted successfully. So I reinstalled bank 1 just to be sure ... and it booted again. Just removing and reseating two of the four DIMMs had cleared the problem.
Now, had I been SMART, I'd have quit while I was ahead. But at this point I thought, "You know, as long as I'm doing hardware maintenance, I should check to see if there's any BIOS updates that look like I should apply them."
And that's where the real trouble began. The existing BIOS was version F3, and the current version for this motherboard (a GigaByte GA-K8N Ultra-SLI) is F7h. I looked at the list of things patched and fixed since F3, and decided I should update. So I downloaded the F7h BIOS image and flashed the primary BIOS with it.
It flashed without any problems, but vorlon wouldn't boot correctly any more. Now, it was getting halfway through Windows' boot process and spontaneously rebooting itself, consistently at the same point every time. It seemed as though it was booting faster, but that didn't help matters if it couldn't boot all the way.
I tried several times, unsuccessfully, to configure it out of the spontaneous-reboot cycle; sighed, and decided to put it back the way it had been. So I wrestled with it for half an hour or so until I could get it to boot from the backup BIOS image (which was even older, version F1). I tried using the BIOS utility to reflash the primary BIOS with the known-good F3 version off a diskette, but discovered that the one diskette I have is no good, and the utility wouldn't install the BIOS image because it was failing its checksum. So I booted back into Windows (still on the F1 BIOS) and went back into the @BIOS utility to reflash the primary BIOS back to F3 from disk.
Flashing failed part-way through. Uh-oh. Now I was really not happy. I tried it a couple more times. It consistently failed at the same point. Just in case my on-disk F3 BIOS image had somehow become corrupted, I tried downloading BIOS version F5 (which had most of the updates I cared about) and tried flashing that. No joy; it was still failing at the same point.
"OK," I thought, "I'll reboot the machine into DOS off the backup BIOS and try flashing the primary bios with the CLI tool instead of using @BIOS." Problem: No usable floppy disks to be found. So I built a bootable CD with the CLI flash tool and all of the BIOS images I had available on it, and rebooted.
And the machine wouldn't boot. At all. If the primary BIOS is corrupted, it's supposed to fail over and boot from the backup BIOS. But it wasn't happening. I spent the next couple of hours trying unsuccessfully to find a way to force it to boot from the secondary BIOS, but it just wasn't happening. I can get as far as telling it to load the BIOS setup utility, but the utility's just ... not there.
So at this point, I have a working, if antiquated, secondary BIOS, but I can't get to it, and the machine won't boot at all from the primary BIOS. To all practical purposes, it's completely bricked, unless I can get an answer out of GigaByte for how I can force it to use the secondary BIOS when the primary BIOS image is corrupt.
I note as an afterthought that the usefulness of a second, backup BIOS is considerably reduced if there's no way to force the board right from the start of boot to use the secondary BIOS if the primary is bad. If the two EEPROMs were socketed, as a last resort I could just physically swap them, but they're surface-mounted. If I figure out a way to get out of this, I'm going to change the way I use the dual BIOS: I'm going to keep the PRIMARY as the known good BIOS, install new updates to the secondary BIOS first, then boot from the secondary, until I know the update is good. If I'd known it wouldn't fail over to the secondary on a sufficiently-corrupted BIOS, I'd have done that this time.