...Shortly before 0100, Wen (whom we didn't realize had grown tall enough to do so) put the blue stool next to the back table in the office, climbed up on it, reached up, and pushed the button labelled '0' on the front panel of the UPS. This resulted in a loud CLICK, sudden darkness, dead silence apart from the sound of many hard drives spinning down all at once, and great consternation and dismay between myself and Cymru.
The blue stool has now been confiscated. Over the next 45 minutes or so, we brought machines back up and repaired machines. Llioness wouldn't finish booting because fsck, for reasons that are still unknown, was dynamically linked to libgcc_1.so, which was not available. Fortunately, we had a set of Slackware 9.1 CDs around, courtesy of our friend Snack, and llioness's problems were solved by the simple expedient of installing the Slackware 9.1 e2fsprogs package. A quick series of manual fscks, a reboot, and llioness was happy. Babylon5 started up fine, but a few services failed to start, necessitating a little troubleshooting. I then decided that since I had them there, I'd upgrade babylon5's core utilities to Slackware 9.1 from source (why install i486-optimized binaries when I could compile my own Athlon-optimized ones?) from the Slackware 9.1 CDs. I also made sure as I was doing this that everything I consider crucial was statically linked. This process went fine until, watching coreutils make install, I realized it wasn't doing what I'd intended. It being by now way too late at night and both of us dog-tired, I figured I'd just do it over, and ran a make uninstall.
print (defined $HOMER_SIMPSON ? "DOH!" : "Oops.");
Fortunately, I have Bacula running, doing nightly incremental backups, so I was able to just restore /bin, then do it over. With babylon5 stable again, we went to bed at last. This morning, after finding and fixing a minor glitch (something in the coreutils make install set $HOSTNAME to '-s', causing auth to break all over the place in assorted ways), I finished the job, installing jfs and xfs tools, upgraded reiserfs tools and linux-utils, and a few other essentials without further incident. I figure this'll hold me until I bite the bullet and do a full upgrade to Slackware 9.1
There is an ancient Chinese mixed benison and curse: "May you live in interesting times." It doesn't specify "with children," but maybe it should.
no subject
Fortunately, it was an RS/6000 running AIX and therefore JFS, and had been idle at the time. With all the journals synced to disk already, the box had no idea it had been downed so unceremoniously. You might want to look into EXT3 for the same reason....
BTW, friended you and Cymru...
-- the Bot :)
Journalling filesystems
Well, it wasn't really a journalling-filesystems issue. All our machines are on journalling filesystems already -- ext31 (#foot1) on the Linux boxen, ufs on the Solaris boxen.
The problem was that llioness had somehow ended up with fsck binaries that were dynamically linked to a library they shouyldn't have been linked to which was located on a non-root filesystem, so the box couldn't fsck anything until I installed new fsck binaries, and babylon5 turned out to have a couple of non-essential daemons incorrectly linked to an outdated libcrypto (0.9.6.something) that I didn't know existed because it was in the wrong location.
But yeah, I've had this experience in a work environment too, at STI. One of the platform managers brought in his five-year-old, brought her into the machine room with him, and left her unsupervised behind him. Whereupon she spotted the big happy friendly glowing red switch on the front panel of the server UPS, and proceeded to flip it.
This happened twice, about 25 minutes apart. The Netware server had just barely finished recovering and mounting filesystems from the FIRST time. Needless to say, after the second incident I politely informed $MANAGER that henceforth his child process would not be permitted to run with machine-room privileges, supervised or not.
[1] However, I'm considering migrating at least babylon5 to either jfs or xfs for filesystem performance reasons. I couldn't at this point tell you why I installed the updated reiserfs tools as well, except to speculate that it was way too late at night, because in comparison to jfs and xfs, quite frankly reiserfs sucks performance-wise, and on certain common operations it really hammers2 (#foot2) the CPU. Having therefore no intention of ever actually using reiserfs, I have no rational reason for installing the tools. ()
[2] I don't consider it reasonable for a common filesystem operation, in and of itself, to consume 90% of CPU on a dual Athlon XP2400+ machine, which is what the benchmarks I saw were performed on. ()
no subject