Profile

unixronin: Galen the technomage, from Babylon 5: Crusade (Default)
Unixronin

December 2012

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
3031     

Most Popular Tags

Expand Cut Tags

No cut tags
Saturday, October 2nd, 2004 05:28 pm

So, I've been meaning for some time to bring minbar, my Sun Ultra30 NFS server, up to date with the latest set of recommended patches.  Let me see, I guess it was Wednesday that I received the CD from [livejournal.com profile] rbos[1] bearing the latest Solaris 9 recommended and alert patch clusters and Java2 SDKs for Linux, Solaris and Windows (thanks, Rob).  On Thursday, I unmounted all NFS shares, stopped all services running on minbar, and started applying the patches.

The first warning was when 112233-12, a jumbo patch, hung part-way.  Ceased to respond, stopped generating new subprocesses, no activity, nada.  Completely inert.  I eventually had to kill -9 the install-cluster process.  Restarted the patch cluster, and on the second try, 112233-12 went through fine.

A couple of hours later, I had a fully patched Ultra30.  I rebooted it .... and the trajectory of the fecal matter intersected the locus of the rotating vanes.  It couldn't fsck the root metadevice (minbar runs off mirrored boot disks).  Neither could it fsck the individual submirrors after detaching the second mirror.  It couldn't remount either mirror individually, rw, in order to enable me to perform any repairs.  I could metasync the / mirror with no problems, I just couldn't do anythine else with it.  I ended up doing a lot of rebooting single-user from CD.  It looked as though the updated UFS driver was choking, so I tried first backing out and then re-applying 113073-14, which is a UFS filesystem patch.  No joy.

Insert multiple iterations of booting from CD, mounting the submirrors, editing files on the submirrors, booting from either submirror, and frustration when it still refused to work.  I knew the actual files on both submirrors were good, because it'd boot single-user from either one up until the point where it tried to remount / rw, so I tried newfsing each submirror and copying the contents of the other across in turn, to make certain the filesystems themselves weren't somehow subtly corrupted.  No joy.  So I back out 113073-14 again and reboot again.  Still no joy.  I'm pretty certain at this point that the root metadevices have gotten frelled somehow.  But I can't fix them because, I've realized, the system id still trying to boot from /dev/md/dsk/d10 every time.

Along about 0830 Friday, I finally remembered that there's one other place the metaroot command makes changes:  /etc/system.  The md section in /etc/system warns, "This section is generated automatically, never edit it by hand."  Tough titties, buddy, I'm fresh out of choices on that one -- I can't get the machine booted from its own disks and those disks mounted rw to rerun metaroot.  So I do another cd-boot cycle, fsck both root slices, mount both root slices, hand-edit the rootdev entry in /etc/system, reboot single-user.

Bam, it boots, and / is mounted read-write.  I'm in business.  NOW the machine's finally in a state where I can work on it.  I blow away metadevices d10 through d12 (the mirror and its two submirrors), recreate new metadevices, remirror the boot disk, reboot, watch everything fsck perfectly, reattach the second submirror, watch the mirror happily resync, and all is happy again.  Until I try to re-apply 113073-14, which panics the kernel.

So right now, everything's up and running, except I'm missing a patch or two.  I figure after the full backup I'm doing right now completes, I'll take the machine down to single-user again, metaroot it back to the raw disks again, break the mirror again, boot off the alternate disk, and try re-applying 113073-14 on the primary disk.  Then if that goes well, I'll re-establish the mirror and bring everything back up.

Wheee.

Update:

Taking the machine all the way down to single-user and unmirroring everything did, indeed, allow me to successfully re-apply 113073-14.  Everything's now back up and remirrored again, and it just this minute got done resyncing for the second time.


[1]  At 8 kilobits on a good day with a 12-hour connection timeout, it was completely unrealistic to attempt to download them ourselves.  It would have consumed our entire available bandwidth, 24 hours a day, for over a week.

Saturday, October 2nd, 2004 03:16 pm (UTC)
Argle bargle. Debian, anyone?
Saturday, October 2nd, 2004 03:44 pm (UTC)
Debian makes my brain hurt .... too many levels of virtualization on everything. I find it impossible to figure out where the hell to actually configure anything, because every damn configuration setting I can find is a pointer to some other setting someplace else. It is NOT old-time-cli-hacker friendly at all.

Besides, there are two reasons minbar runs Solaris:
(a) historically speaking, Linux has not run well on Sun hardware and has had poor Sun device support;
(b) minbar is primarily an NFS server, and frankly, the Linux nfsd has always been ass. You can have performance (knfsd) or configurability (userspace nfsd); pick only one.
Saturday, October 2nd, 2004 05:05 pm (UTC)
BSD...

(dons nomex undergarments in preparation for flamefest...)

:-)
Saturday, October 2nd, 2004 05:16 pm (UTC)
Yup, I have a Sparc LX running OpenBSD that's intended to be used as a firewall, if I can ever get it to talk properly to a hme interface or get a second sbus le to put into it.
Saturday, October 2nd, 2004 05:40 pm (UTC)
Mmm... OBSD's my favorite flavor (or should that be favourite flavour?)... I suppose FBSD next and NBSD last, unless I'm installing on something wildly exotic, in which case NBSD gets the nod.
Saturday, October 2nd, 2004 05:49 pm (UTC)
Yeah, I'm a big fan myself. "No remote root holes in the default installation ever." Ideal for firewalls. Dialup, too, since OBSD pppd supports dial-on-demand. (I really need to get back to working on that, actually. Except right now our UPS has no spare capacity even for the LX.)

I just wish the hme driver worked right. It seems to be usable at 10 megabits, but ONLY at 10 megabits. It's a constant source of surprise to me that no-one's fixed it.
Saturday, October 2nd, 2004 08:27 pm (UTC)
Yep, 'twas my firewall for several years back in SF... Also had wierdness with hw-had it on a Apple Quadra 800, and had weirdness with NIC drivers.