Profile

unixronin: Galen the technomage, from Babylon 5: Crusade (Default)
Unixronin

December 2012

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
3031     

Most Popular Tags

Expand Cut Tags

No cut tags
Friday, January 9th, 2009 08:23 pm

Last night, vorlon, my Athlon64/Win2K gamebox, did something weird.  I went to log off it and found the screen was blank grey, and wouldn't clear.  Assumign that it had locked up somehow, I power-cycled it, left it to reboot, and didn't think anything more of it until this morning ... when it still had a blank grey screen.  I tried again to reboot it, and it failed to boot.

"Houston," I thought, "we have a problem."

My first thought was that the hard disk had failed.  Over the next hour and a half or so, in between thoroughly vacuuming everything in sight, I tested various hardware subsystems as best I could on a non-booting machine.  I determined that the disk was OK as far as I could tell, and gradually narrowed the problem down to either the motherboard or the RAM, so I started testing the RAM one bank at a time.  Unplugging bank 1, I booted the machine on just bank 0 ... and it booted successfully.  So I reinstalled bank 1 just to be sure ... and it booted again.  Just removing and reseating two of the four DIMMs had cleared the problem.

Now, had I been SMART, I'd have quit while I was ahead.  But at this point I thought, "You know, as long as I'm doing hardware maintenance, I should check to see if there's any BIOS updates that look like I should apply them."

And that's where the real trouble began.  The existing BIOS was version F3, and the current version for this motherboard (a GigaByte GA-K8N Ultra-SLI) is F7h.  I looked at the list of things patched and fixed since F3, and decided I should update.  So I downloaded the F7h BIOS image and flashed the primary BIOS with it.

It flashed without any problems, but vorlon wouldn't boot correctly any more.  Now, it was getting halfway through Windows' boot process and spontaneously rebooting itself, consistently at the same point every time.  It seemed as though it was booting faster, but that didn't help matters if it couldn't boot all the way.

I tried several times, unsuccessfully, to configure it out of the spontaneous-reboot cycle; sighed, and decided to put it back the way it had been. So I wrestled with it for half an hour or so until I could get it to boot from the backup BIOS image (which was even older, version F1).  I tried using the BIOS utility to reflash the primary BIOS with the known-good F3 version off a diskette, but discovered that the one diskette I have is no good, and the utility wouldn't install the BIOS image because it was failing its checksum.  So I booted back into Windows (still on the F1 BIOS) and went back into the @BIOS utility to reflash the primary BIOS back to F3 from disk.

Flashing failed part-way through.  Uh-oh.  Now I was really not happy.  I tried it a couple more times.  It consistently failed at the same point.  Just in case my on-disk F3 BIOS image had somehow become corrupted, I tried downloading BIOS version F5 (which had most of the updates I cared about) and tried flashing that.  No joy; it was still failing at the same point.

"OK," I thought, "I'll reboot the machine into DOS off the backup BIOS and try flashing the primary bios with the CLI tool instead of using @BIOS."  Problem:  No usable floppy disks to be found.  So I built a bootable CD with the CLI flash tool and all of the BIOS images I had available on it, and rebooted.

And the machine wouldn't boot.  At all.  If the primary BIOS is corrupted, it's supposed to fail over and boot from the backup BIOS.  But it wasn't happening.  I spent the next couple of hours trying unsuccessfully to find a way to force it to boot from the secondary BIOS, but it just wasn't happening.  I can get as far as telling it to load the BIOS setup utility, but the utility's just ... not there.

So at this point, I have a working, if antiquated, secondary BIOS, but I can't get to it, and the machine won't boot at all from the primary BIOS.  To all practical purposes, it's completely bricked, unless I can get an answer out of GigaByte for how I can force it to use the secondary BIOS when the primary BIOS image is corrupt.

I note as an afterthought that the usefulness of a second, backup BIOS is considerably reduced if there's no way to force the board right from the start of boot to use the secondary BIOS if the primary is bad.  If the two EEPROMs were socketed, as a last resort I could just physically swap them, but they're surface-mounted.  If I figure out a way to get out of this, I'm going to change the way I use the dual BIOS:  I'm going to keep the PRIMARY as the known good BIOS, install new updates to the secondary BIOS first, then boot from the secondary, until I know the update is good.  If I'd known it wouldn't fail over to the secondary on a sufficiently-corrupted BIOS, I'd have done that this time.

Tags:
Saturday, January 10th, 2009 02:42 am (UTC)
At one point I thought I had bricked my motherboard with a BIOS upgrade. What fixed it was a complete BIOS reset, using the jumper that most motherboards have for this purpose. This was a Asus, though, so YMMV.
Saturday, January 10th, 2009 03:13 am (UTC)
Yes, I tried clearing the CMOS via the jumper. It didn't appear to work. If I knew a way to reset the BIOS completely back to the original BIOS it shipped with, I'd try that.
Saturday, January 10th, 2009 03:31 am (UTC)
Sometimes there's a magic keyboard combination you can use to get it to boot off a BIOS that's located on a floppy drive or a CDROM or a thumb drive?
Saturday, January 10th, 2009 02:58 am (UTC)
*k gigabyte. They won't fix my board I'll let them become abit. On the mOld boards there was a jumper I think. It's been a while.
Saturday, January 10th, 2009 03:20 am (UTC)
There is a "Clear CMOS" jumper, and I've used it. If I knew a way to reset all the way back to the original as-shipped BIOS, that would solve the problem, but I don't know a way to do it.
Saturday, January 10th, 2009 07:08 am (UTC)
There's no way if it won't boot that does not involve how they got the bios on there in the first place. That's a jtag device or soldering on programmed chips. See if you can get warranty coverage even if it's out of warranty because their backup system sucks.

Saturday, January 10th, 2009 04:58 am (UTC)
That is a very useful insight. Thanks for posting it! I am sorry you learned it the hard way. On some of the older MB's there was a way to clear BIOS in addition to a way to clear CMOS. Perhaps a jumper that isn't installed, but still has pads on the MB?
Saturday, January 10th, 2009 05:21 pm (UTC)
I'm waiting for a response from technical support to tell me whether there's any way to clear a failed BIOS load, or to force failover to the secondary BIOS.
Monday, January 12th, 2009 11:08 pm (UTC)
Oh well, lesson learned, I guess. At this point I would probably hush-hush what really occured, report the motherboard as dead to my wife and look into getting a new one.
Monday, January 12th, 2009 11:48 pm (UTC)
Oh, I'd love to rebuild the machine around a new state-of-the-art motherboard and multiple-core processor, sure, if the money for it was there. But as it is, I'll be lucky if I can scrape a compatible replacement motherboard out of the budget.

If I don't hear good news from GigaByte support, I have a fallback plan with reasonable chances of success to destructively remove the existing flash BIOS chips and install a new pre-programmed set. That'll cost me about $20. If that fails, well, I know where I can get a factory-reconditioned replacement board with a 30-day warranty for $80.

If I end up having to buy a brand-new board, and new CPU and RAM for it, I'm going to be out probably four or five hundred that we just don't have to spare right now.
Sunday, January 18th, 2009 06:01 pm (UTC)
Ahem.