unixronin | Entries tagged with hardware

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

LOMS Incident

Saturday, March 3rd, 2012 07:22 pm

One of the more drastic things that can happen in a nuclear power plant is what's known as a LOCA, a loss-of-coolant accident. I like to refer analogously to some of the more drastic things that can happen to electronic equipment as a LOMS incident — Loss Of Magic Smoke.

This is particularly relevant today because that's what just happened to my main uninterruptible power supply, an APC SU3000RM3U which powers both of my workstations as well as my server/network rack.

As close as I can figure out, the capacitor just to the left of that power blade failed first. (You can see another matching one about an inch behind its mortal remains.) This was the first muffled bang I heard, and everything lost power at that exact moment, which is probably why nothing powered by the UPS was damaged when that row of power transistors went into runaway and blew up in a ripple about eight seconds later, creating a second, much louder and sharper bang. Not only did they vaporize their power leads, they blew several holes in the top layer of the main board. This was the point at which the magic smoke began to drift out from the front of the UPS.

The UPS, of course, is toast. And it was only last year I managed to locate a matching expansion battery chassis for it...

Current Mood: annoyed
Current Location: Gilford, New Hampshire
Current Music: VNV Nation :: FuturePerfect :: Airships (2002, 08:18)
Crossposts: http://unixronin.livejournal.com/841682.html

Tags:

hardware

Hardwarestat

Saturday, December 11th, 2010 07:13 pm

Well, let's see.

I've had one infant-mortality on the new babylon5, the Gigabyte Radeon HD4350 video card, which ceased to output video on DVI after two weeks of life. I replaced that with an XFX HD5570 card that was on sale, and will be sending the Gigabyte card off for repair under warranty (then probably keep it as a spare, lacking another PCI-Express machine to put it in). The replacement card also displays the symptom of screen image downscaled to about 95% when using the HDMI connection to the Asus 27" LCD monitor, which means that's probably a glitch in the monitor that I'll have to bother Asus about, after the 28" Hanns-G comes back from its third round of warranty repair (this time to repair the permanently-stuck-on red column driver at about column 600). I'll still probably put it back on babylon5 after it comes back, because I've discovered that I really miss that extra 120 pixel rows of screen real-estate. (The Asus is 1920x1080; the Hanns-G is pretty close to the only 1920x1200 monitor left in existence in the 27"-28" size range.)

Annoyingly, after I found tall-enough (and cheap!) VESA monitor mounts at Monoprice, I realized that while the Asus has 100x100 VESA mounting points the same as almost all of the mounting arms on the market, the Hanns-G has only 100x200 mounting points. Fortunately, I found an adapter plate online this morning, which saves me from having to make one, to do a decent job of which I'd need access to a machine shop.

I have reasonable confidence at this point that I've gotten everything I need off excalibur, the old babylon5, and can safely wipe it and shut it down. Not sure yet what I'll do with the old hardware. It's a single-core AthlonXP 2400+ on an Asus motherboard which is maxed out at 3GB of RAM (the new machine, also on an Asus board, is only half populated at 8GB), and on which I've already replaced all the power smoothing capacitors after the leaked, and an Intel-MegaRAID-based LSI Logic SATA-RAID controller because that was the only way I could get it to boot off SATA after the last of its SCSI disks failed.

(New babylon5 has mirrored SATA-2 SSDs, and they are truly stunningly fast. How do you like the sound of 2.17 wall-clock seconds to create a 1GB file?)

Also on the hardware front, I just replaced the HP R3000XR UPS we've been running on for the past two years or so with an APC SU3000RMXL. I was already sick and tired of the R3000XR's bizarre battery charge behavior¹, and when it stopped responding on its serial monitoring port, that was the final straw. Now I just need to find my APC SmartUPS serial cables and compile apcupsd on babylon4.

Still in the queue: two older Windows boxes² that have undemanding jobs to be replaced by low-power dual-core Atom boxes for the Pirate and Wen, and Valkyrie's desktop-case Dell GXP150 to be replaced (the plan is for her to get the current vorlon after it gets upgraded to maximum RAM, an Athlon64 X2 processor, and a reasonably recent video card). On the infrastructure side, the main server needs a lot more RAM and an entire new rack of disks (and could use more CPU, but that's a separate problem), and the DB server also needs more RAM. Then we'll be in pretty good shape again except for wireless and laptops. Especially if I can replace the missing rackmount ear for my gigabit switch without having to pay Dell's prices for it (they want as much for a replacement rackmount kit as it would cost me to buy a whole new gigabit switch).

It appears we're even going to be able to get rid of our balky Color Laserjet 4500DN, which seems to have given up the ghost during the couple of weeks it just spent out in the unheated deckhouse. We received an inquiry about whether we'd be interested in donating it for parts. Yes, certainly. Can I interest you in a 3KVA HP UPS as well...?

[1] Left to itself, it would fully charge its battery pack to 100%, hold it there for 48 hours almost exactly, crash-drain it to just below 50%, gradually drain it further down to 40% charge over a period of about twelve hours, then maintain it at around 40% charge for about 32 days give or take a few hours. Then it would charge it back to 100% and do the same thing all over again. Plus, it took herculean efforts and about ten minutes reading and re-reading of the operators' manual every time to get the damned thing to power down. HPaq = UPS FAIL.

[2] One is a Pentium-III, I think; the other a single-core AthlonXP 1500+ even older than excalibur, which — worse yet — is running at only three-quarter speed because it has PC100 RAM instead of the PC133 it should have, but is so old and slow it's not worth fixing.

Current Location: Gilford, New Hampshire
Current Music: Quiet Riot :: Metal Health (Bang Your Head) :: Metal Health (Bang Your Head) (05:18)
Crossposts: http://unixronin.livejournal.com/807792.html

Tags:

geekdom,
hardware

Getting more eyes on the problem

Wednesday, December 8th, 2010 05:40 pm

Friends,

I need assistance finding something very specific. Having failed so far, I'm trying to get more eyes (and more existing knowledge of what's available) to work on the problem.

What I need to be able to do is to mount two 2.5" solid-state disks into the 3.5" hard disk cage in a Thermaltake Element G case. What I need in order to accomplish this is, of course, is two 2.5"—3.5" SSD/HDD mounting adapters.

That part is easy. Heck, the SSDs I bought shipped with mounting adapters. But there's a catch.

You see, the Thermaltake case uses a vibration isolation HDD mounting system that requires the use of Thermaltake's own provided mounting screws, which are shouldered screws with an integral rubber bushing. Of course, since they're designed for mounting 3.5" hard disks, they naturally and quite reasonably have standard, relatively coarse thread #6-32 HDD mounting screws. They wouldn't work if they didn't.

The catch is that without exception, every 2.5"—3.5" adapter I have so far found, including the ones that ship with the Mushkin SSDs, is threaded for fine-thread M3-0.5 screws. Which means the Thermaltake HDD mounting screws won't fit. Which means the adapters won't work in the Thermaltake case. (See this page¹ for more information on the screw sizes. The third photo clearly illustrates the problem.)

So, here's what I need: A pair of 2.5"—3.5" SSD/HDD adapters that mount into the case using #6-32 screws.

Can anyone point me at such a thing?

Update:

I forgot to mention above that in this case, you do not screw hard disks into the drive cage, you screw rubber vibration-isolation bushings (hence the special shouldered screws that must be used) onto the drives — or, in this case, the adapters — and slide them into rails in the drive cage.

After bouncing ideas around a little, I think I've come to the conclusion that however much I would have preferred to find some adapters that actually used the correct size and pitch threads in the first place, the best solution to this problem is going to be to drill the M3-0.5 holes in the Mushkin 3.5" adapters out to 9/64" and buy a package of #6-32 Nylok nuts.

[1] Yes, I know that page says a 6-32 screw can be made to fit into an M3-0.5 screw hole. And a square peg can be made to fit into a round hole, if you have a big enough hammer and the mechanical aptitude of a caveman, but neither the peg nor the hole will ever be good for much again.

Current Location: Gilford, New Hampshire
Current Music: Enya :: A Day Without Rain :: Deora Ar Mo Chroi (02:48)
Crossposts: http://unixronin.livejournal.com/807409.html

Tags:

hardware

Frist prost!

Saturday, November 27th, 2010 01:39 pm

...From the new machine (temporarily named excalibur) that's going to be the replacement for babylon5, that is (and will take over its IP address and hostname). I assembled the machine Thursday evening; by yesterday evening I had essentially everything set up on it except for sound. I STILL don't have sound working; I suspect it's a case of "this specific HD-audio chipset¹ isn't really properly supported yet."

The old hardware was a single AthlonXP 2400+ core (2GHz) with 3GB of RAM and, most recently, mirrored 80GB SATA disks on an LSI Logic MegaRAID controller, in an Antec case old enough that I've forgotten the model number. The new hardware runs a Thuban-core PhenomII x6 3.2GHz with 8GB of RAM (at present) and mirrored 60GB Mushkin SSDs.

[1] Apparently neither of them are, actually. The machine seems to have two, the second of which I wasn't expecting:

00:07.0 Audio device: nVidia Corporation MCP72XE/MCP72P/MCP78U/MCP78S High Definition Audio (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device 83c5
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel
04:00.1 Audio device: ATI Technologies Inc R700 Audio Device [Radeon HD 4000 Series]
        Subsystem: Giga-byte Technology Device aa38
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

The second is apparently built into the ~~sound~~ VIDEO card. (Bad fingers!) So far, I can't get either one working.

Current Location: Gilford, New Hampshire
Crossposts: http://unixronin.livejournal.com/804792.html

Tags:

hardware

Case remanded to higher court, overturned on appeal

Tuesday, November 23rd, 2010 01:55 pm

...So to speak. I had to ask the NewEgg tech to escalate to his supervisor, but my efforts won me approval to return the Antec case for store credit, which is fine by me, since I still have AT LEAST two more computers and two more monitors to replace.

Current Location: Gilford, New Hampshire
Current Music: ABBA :: Thank You For The Music :: She's My Kind of Girl (02:45)
Crossposts: http://unixronin.livejournal.com/803303.html

Tags:

hardware

The continuing UPS saga

Thursday, September 30th, 2010 09:57 am

I've been continuing to monitor the behavior of my UPS using the UPSwatch tool I wrote. It is quite consistently, every 48 hours, going into a mode in which, although still on line power, its output vopltage drops from 125V to 115V and it begins suddenly discharging its battery pack. The battery charge plummets at first, then gradually levels out, dropping to just over 40% over a period of about five hours before leveling out. It did it again at about 1900 yesterday, had almost completely leveled out at somewhere around 44% by midnight, and now seems to be holding steady at 43.1% charge. It will apparently remain at this charge level indefinitely until I interrupt line power for a few seconds, forcing it to switch over to battery. Then, when I reconnect the power, it will switch back to line and say, "Oh, gee, I guess I should recharge this battery, huh?" and charge the battery back to 100% charge, typically within 20 to 30 minutes. Then, 48 hours later, it'll go through the whole rigmarole again. You can see an example here.

This time, I'm just going to let it sit for a couple of days before I bump the power, just to see what happens. Needless to say, this is getting pretty old. It's looking fairly clear that the UPS has a fault somewhere in its firmware.

Granted, it does have old firmware. I need to see if I can get it reflashed to the latest firmware (which can be done only from Windows) and see if that solves the problem.

(Update: The gap in the data is where I had upsd stopped while I hunted for the correct-type-and-polarity serial cable I could have SWORN I had in order to reflash the UPS with new firmware ... no such luck.)

Current Location: Gilford, New Hampshire
Current Music: Midnight Oil :: 10,9,8,7,6,5,4,3,2,1 :: Scream in Blue (06:18)
Crossposts: http://unixronin.livejournal.com/795282.html

Tags:

hardware

Arrays and RAIDZ and spares, oh my!

Sunday, September 26th, 2010 01:26 pm

So ... when I first got my current main server, babylon4 (which was new to me, but not by any stretch new hardware — including its disks), I set it up with Solaris 10u5 x86. I installed a mirrored pair of 2.5" 80GB SATA laptop drives in the single bay intended for a boot drive, and configured the main array of twelve 300GB SATA disks as a ZFS RAIDZ2 pool. RAIDZ2, with two parity disks, should be able to survive any two drive failures and continue operating in degraded mode.

So, about two months later, we had an overnight cascade failure of three drives, and the array went down.

I reconfigured the remaining nine drives as an eight-drive RAIDZ2 plus a single hot spare, and restored all the data. Over the next few months, one more drive failed; the hot spare was automatically pulled in as a replacement, just as it should. When I got my hands on replacements for the (by now) four failed drives, I added them in as hot spares. We've had no further failures.

Recently something hosed the boot archive and took the system down. All the zpools were intact, so we didn't lose any user data, but I ended up reinstalling with Solaris 10u8. Then, not long after, Sun ... er ... Oracle released Solaris 10u9 as a developer release. Completely unsupported, sure, but I can't spare the money Oracle wants for a support contract anyway, so what's the difference? So I live-upograded the machine to u9, and upgraded all the zpools to ZFS version 22. But, ZFS 22 now supports RAIDZ3, an even-higher-reliability format for large disk pools, using three independent parity stripes.

So, yesterday I made one last differential backup, then blew away the RAIDZ2 zpool, reconfigured the array as an eleven-drive RAIDZ3 plus a single hot spare, and restored all the data overnight. RAIDZ3 and a hot spare may be a little paranoid ... but I just increased the size of the working set by 600GB, and it should be able to survive up to four drive failures, as long as it finishes rebuilding the first failed drive on the hot spare before the fourth drive fails.

As an incidental side note, I note that the statfs() call in a 32-bit Linux kernel overflows when called on a filesystem with more than 2TB of free space...

Current Location: Gilford, New Hampshire
Current Music: Bonnie Tyler :: Faster Then the Speed of Night :: Total Eclipse of the Heart (07:02)
Crossposts: http://unixronin.livejournal.com/792614.html

Tags:

hardware

Keyboards: Do Not Buy (V7 Comfort Keyboard)

Thursday, September 23rd, 2010 06:31 pm

I'd hoped this keyboard might make an inexpensive replacement for my buggy Microsoft Natural Keyboard 4000 and its astoundingly fast-wearing key caps (which aren't a defect, because Microsoft has cunningly redefined keys wearing completely blank within a few months of use as "normal wear").

No such luck. "Comfort" and "ergonomic" on this keyboard are bad jokes. Cheap construction, appallingly poor key feel, almost nonexistent height adjusters, and so flat that it actually feels dished in the middle. It probably comes as no surprise that, just like the MS Natural 4000, it's made in China.

So is this one, of course, the only other Natural-style wired ergonomic keyboard I could find that's not made by Microsoft (since all the ergonomic keyboards Microsoft currently sells are, frankly, garbage). The hard part, of course, is finding one that ISN'T made in China.

If only Microsoft would start selling the Natural Keyboard Pro again... that was the best keyboard I've ever used. Dell even sold black ones with their name on them. But they haven't been made in at least ten years (manufacturing cost was too high, apparently, because they were decently made), and are pure unobtainium now. You might occasionally come across a refurbished one selling for almost as-new price.

Sometimes I regret ever getting used to this style of ergonomic keyboard. But then I remember the wrist pain I used to get from using traditional straight keyboards... it gets pretty hard to write code when it hurts to type. I suppose I can try the Adesso PCK-208; of course, it's Chinese too, but looks as though it may be a lot closer to the Natural Pro model. I just have to pray I never need tech support, because when I tried to ask Adesso about how the key caps are marked, the answers I got back were completely incomprehensible. All I can manage to recall is something incoherent¹ about "not possible laser".

[1] If you'll pardon the pun...

Current Location: Gilford, New Hampshire
Current Music: VNV Nation :: Empires :: Standing (05:40)
Crossposts: http://unixronin.livejournal.com/791172.html

Tags:

hardware,
psa

General Stuff Update

Friday, September 17th, 2010 07:09 pm

If you have a HP R3000XR uninterruptible power supply, you can, technically speaking, hot-swap out a failed battery pack. However, when you pull out the old battery pack, the UPS will automatically go into bypass mode, and may not come out of it after you put the new battery pack back in, so you'll probably have to power down the UPS to reset it anyway. DAMHIK.

I'm actually still not entirely sure this UPS is behaving correctly. After resetting the UPS a few days ago, the battery pack stayed between 99% and 100% for several days, was just over 98% the last time I looked before today, had dropped to 55% charge by early afternoon today, and is barely above 50% now. I suspect our UPS has a fault; it appears to, after any reset, charge the batteries fully, keep them fully charged for a day or two, then start slowly draining the battery pack at a rate of one to two percent per hour. I'm writing an RRDtool-based UPS monitoring daemon right now to track the battery charge level.
babylon4 appears entirely stable so far on Solaris 10 u9, even though u9 is offocially an unsupported developer-only release. (Hey ... since I can't afford the price of an Oracle service contract, ALL Solaris releases are unsupported for me. What's the difference? At least this way I'm running on fully patched-up-to-date code.) I upgraded the zpools from v15 to v22 this morning and deleted the old s10x_u8wos_08a boot environment. At some point before the next full backup set (due the first Monday of October), I plan to reconfigure the main array from its current configuration of RAIDZ2 with four hot spares¹ to RAIDZ3 with one hot spare, then restore all the data to the array. This will add roughly 600GB to the available working set.
In the course of trying to clear emissions system errors to get it to pass inspection, we've replaced the gas cap and the fuel tank vent valve (a $175 dealer-only part) on the Baby Benz. It's still throwing error codes. No further news yet. Fortunately, Eisbär, the Volvo XC70, is all set and happy and registered and inspected and ... stuff.

This update's ended up ... rather less general than I intended, because I've first gotten thoroughly sidetracked, then spent most of the last four or five hours trying to figure out what's going on with the UPS.

(Hmmmmmm. I wonder if I can kick the UPS back into charging by disconnecting its line power for a moment...)

[1] I normally wouldn't use as many hot spares as this, of course. But the array was last reconfigured as RAIDZ2 across eight disks plus one hot-spare, after an early-morning three-disk cascade failure a couple of months after I brought the system up. When I later replaced the three failed disks, I didn't feel like reconfiguring the array again, and didn't have complete confidence in the remaining disks anyway, so I simply dropped all three new disks into the hot-spare pool. The array has been stable ever since that first rebuild, though, so I have good confidence by now that the three disks that failed were the only three weak disks.

Current Location: Gilford, New Hampshire
Current Music: The Moody Blues :: Legend Of A Band - Greatest Hits :: Your Wildest Dreams (04:51)
Crossposts: http://unixronin.livejournal.com/789862.html

Tags:

hardware

Trials, tribulations, and reinstalls

Tuesday, September 7th, 2010 09:28 pm

Sometime about 0440 Sunday morning, my main server, babylon4, went down hard and fast. I still haven't been able to reconstruct a single thing about what actually happened, but it left the machine down, with the boot archive corrupt and the boot blocks completely gone. The last thing logged before it went down — about half an hour before — was Apache2 logged some probably-acne-ridden git trying in vain to probe for phpMyAdmin holes. (It's not installed. Neither is PHP.) Just to add a weird touch, whatever happened apparently sent a break over the serial console line to epsilon3 and halted it too.

I didn't discover this until I got up on Sunday. I fairly quickly discovered that all of the ZFS filesystems and their data were completely intact; the system just couldn't be booted. I spent essentially all of Sunday trying various different ways to repair the boot blocks and boot archive, not one of them successful. I managed once to boot it by hand using the grub on a Solaris 10 install DVD and the failsafe miniroot from my Solaris installation, but that wasn't any help because ZFS on-disk had been patched to a newer version than originally installed and the ZFS patch didn't patch the ZFS drivers in the miniroot.

Well, I never got around to live-upgrading the machine to Solaris 10 u8 10/09, anyway. So on Sunday I backed up the user-data filesystems in the root ZFS pool over to the main array pool using ZFS snapshots, blew away the root pool, and reinstalled 10/09 from scratch, then on Monday morning set about reinstalling third-party packages and reconfiguring the Solaris ones the way I wanted them. I had a few minor fights with smf, the Solaris Service Management Facility, but after it saw reason and agreed to do things my way, I had most of it all set up and running again Monday night, and finished setting up the last group of services today. Thanks to the ZFS snapshot gambit, I didn't lose a single file or configuration setting.

Of course, being a prudent sort, right now I have a fresh set of full backups running.

Current Location: Gilford, New Hampshire
Current Music: Mike Oldfield :: Tubular Bells :: Tubular Bells, Part One (25:49)
Crossposts: http://unixronin.livejournal.com/786225.html

Tags:

geekdom,
hardware

sigh Here we go again

Tuesday, September 7th, 2010 07:19 am

... And the saga of the Hanns-G 28" monitor continues. When I shut it down last night, it was fine. When powered on this morning, there is a bright red vertical line the entire height of the screen at about 30% screen width from the left side. It won't go away.

Fortunately, this monitor came with a three-year warranty, because not long after it turned two years old it's begun throwing one fault after another. I am becoming less and less impressed with Hanns-G with each iteration of this, and at this point I am virtually certain never to buy another Hannspree product.

It's still under warranty. I suppose I have to deal with YET ANOTHER return for repair under warranty... this will be its third trip back to Hannspree.

Granted, it's still very inexpensive for a monitor this size. But at this rate, shipping for warranty returns is going to eat up the difference fast.

Current Mood: annoyed
Current Location: Gilford, New Hampshire
Current Music: VNV Nation :: Judgement :: Carry You (06:12)
Crossposts: http://unixronin.livejournal.com/785424.html

Tags:

fail,
hardware

Illuminating the elephant

Monday, August 23rd, 2010 11:48 pm

There's an elephant in the middle of the living room that the lighting industry really isn't talking about. Put simply, it's this: Compact fluorescent lamps and Edison screw sockets do not get along. And the problem is heat dissipation.

You see, the Edison screw base works tolerably well with incandescent bulbs, but it was designed for incandescent bulbs. A CFL lamp is a lot more energy efficient than an incandescent bulb, yes, and it produces a lot less waste heat. But it's where that heat is produces that is important.

You see, essentially all of the power dissipation in an incandescent bulb occurs at the filament. That filament is connected to the base only by two very fine wires. As a result, virtually all of the waste heat produced by the bulb is emitted by radiation through the envelope. Very little of it is conducted back to the socket.

A CFL, on the other hand, while it produces much less total waste heat than an incandescent bulb, produces a large proportion of it in its base ballast. And there's nowhere for that heat to escape to except by conduction straight into the socket. The socket isn't designed to dissipate that much heat, so the center contact (which is usually nothing more than a piece of stamped beryllium copper if you're lucky, brass if you're not) overheats, loses what little spring temper it ever had, and collapses flat against the bottom of the socket. Then the socket stops working, because you can no longer screw an Edison base bulb far enough into the socket to make a decent electrical contact with the base.

Now, as it happens, the Edison screw base is on its deathbed anyway. It's being replaced by the GU24 twist lock base, a much better design that is not nearly as dependent upon spring tension to make the base contact. The Energy Star 4.0 specification for residential lighting (PDF) forbids the use of the Edison screw socket, requiring the GU24 base instead. Unfortunately, it's arguably being done for the wrong reason: not because it's a mechanically better design (although we're fortunate that it is), but in order to ensure that it will be impossible to put an incandescent bulb into an Energy Star 4.0 compliant light fixture.

There is, however, a staggeringly huge installed base of Edison screw light sockets in the US. Billions of them. And over the next few decades, they're all going to have to be replaced — because when they stop making incandescent light bulbs (if memory serves, California has already outlawed them), and all the holdouts are forced to switch to CFLs, a lot of Edison screw socket light fixtures are going to overheat and fail.

I suppose it's going to be a good time to be an electrician.

Current Location: Gilford, New Hampshire
Current Music: Ozzy Osbourne :: The Ultimate Sin :: Shot In The Dark (04:23)
Crossposts: http://unixronin.livejournal.com/781977.html

Tags:

hardware

Fun with SCSI

Sunday, August 22nd, 2010 11:49 pm

So, I was recently gifted with a new-to-me Sun V210. Nice fast machine with onboard U160 SCSI and dual 1GHz UltraSPARC IIIi processors. I promptly installed Solaris 10 on it (after finally finding a kludged set of connectors and adapters that actually gave me a working connection to the ALOM serial console) and set about putting it to work. My plan for it is to connect my LTO-2 drive to it, create an iSCSI target for the drive, then connect to the drive via iSCSI on my main server, giving me a drive logically connected directly to the server while actually being, say, at the other end of the house. Having the tape drive and my disk backup pool manaded by the same Bacula storage daemon on the same machine would give me some useful new abilities, but the most important goal is to get the tape drive connected to the main server by a fast enough connection to keep it streaming.

Setting up a point-to-point gigabit connection between the new machine, epsilon3, and the main server, babylon5, was the work of moments. The first obstacle I ran into was that the V210 couldn't see the tape drive. Well, that is, not the V210 per se; a probe-scsi-all from OpenBoot saw the drive just fine. But once booted, Solaris 10 simply would not find the drive nor create device nodes for it. When I finally found the cause for this one, it was a real facepalm.

You see, SCSI comes in a number of flavors, but the distinction that matters to us right now is wide vs. narrow SCSI. Narrow SCSI uses a 50-pin connector with an eight-bit-wide data bus, and uses three-bit device IDs, allowing a device on a narrow SCSI bus to have a SCSI ID from 0 to 7. (By convention, ID 7 is the host controller, leaving 0-6 for other devices.) Wide SCSI uses a 68-pin connector, with a 16-bit data bus and four-bit device IDs, allowing SCSI IDs up to 15. So, only IDs 0-6 are available for narrow SCSi devices; wide SCSI devices can also use 0-6, but can use 8-15 as well, which narrow SCSI devices cannot.

Now, here's the thing: My tape drive (being a wide SCSI device) was configured with SCSI ID 8, the first ID usable only by wide devices. But it turns out that, despite the fact that no hardware manufacturer in the world has shipped a new narrow-SCSI-only device in probably at least the last ten years, Solaris 10 ships with all of the wide-SCSI targets in /kernel/drv/st.conf, the SCSI tape driver configuration file, disabled by default. They're fully enabled and supported in the driver; they're just commented out in the configuration file. So the driver was perfectly capable of talking to the tape drive; it had simply been configured by default not to even look for a tape drive on a wide-SCSI-only ID. Yet ALL modern high-speed tape drives — LTO, DLT, AIT — with SCSI interfaces are wide-SCSI devices.

Go figure.

Anyway, once I found that and fixed it, one more reset boot and I had a full set of device nodes in /dev/rmt, and from there on the drive Just Worked. After some reconfiguring of Bacula, it now dumps a full set of client machine full backups (130GB of data) to a single LTO-1 tape in 90 minutes, and for the first time since I got it, my LTO-2 drive is being fed data fast enough to actually keep it streaming. (And I haven't actually seen yet how fast this setup can dump data to LTO-2 media, on which it should be able to approach twice the transfer rate.)

I still haven't solved the problem of how to create an iSCSI target from a tape device, but that's a battle for another day. Unfortunately, I'm beginning to suspect there simply isn't any way to do it using the iSCSI tools in Solaris 10 — although Solaris 10 can create an iSCSI target that emulates a tape device using a disk backing store. What on earth the point of that is, I must confess, is quite beyond me, as I find myself completely mystified as to why anyone would want to do such a thing.

Current Location: Gilford, New Hampshire
Current Music: Meat Loaf :: Dead Ringer :: Everything Is Permitted (04:40)
Crossposts: http://unixronin.livejournal.com/781580.html

Tags:

hardware

So, just for the record ...

Saturday, July 24th, 2010 04:02 pm

... Trying to start smartd (part of the smartmontools package) on a system with an Intel MegaRAID-based hardware RAID controller is BAD. As in, it will oops your kernel and knock your disk controller offline.

DAMHIK.

Current Location: Gilford, New Hampshire
Current Music: Def Leppard :: YEAH! :: Waterloo Sunset (The Kinks) (03:38)
Crossposts: http://unixronin.livejournal.com/775724.html

Tags:

hardware

The continuing saga ...

Sunday, July 11th, 2010 09:30 pm

... of the M5309 laptop continues, here in the Department of Redundancy Department. Even after replacing the heatsink compound in the M5309, it still has continuous thermal problems in this weather. The cooling design is totally inadequate. Although it's no longer overheating its CPU into emergency thermal shutdown, about 15-20 minutes of running at full CPU speed will overheat the machine as a whole to the point of getting random memory errors every few minutes. The only way to make it run stably has been to remove all the access panels on the bottom of the case and set up a fan blowing directly into the case. With that done, it's most of the way to having a functional X desktop installed, but it's going to be moot if it can't run for longer than twenty minutes in summer without overheating itself. At this point, I'm mainly just continuing to work on the install for the sake of achieving victory.

Current Location: Gilford, New Hampshire
Current Music: Shriekback :: Oil and Gold :: Hammerheads (04:17)
Crossposts: http://unixronin.livejournal.com/774944.html

Tags:

hardware

Hacking

Saturday, July 10th, 2010 05:35 pm

Finally, things have progressed from bang-head-on-wall to ... well, progress.

First of all, I've solved PART of the issues with the Mayhem G3 laptop. I'd previously checked the fans and heatsinks, and they looked fine ... from the outside. Yesterday, at the suggestion of a friend, I partially disassembled the laptop in order to replace the heatsink themal compound. (A good call; what was there was mostly gone. It's now been replaced with a fresh layer of Arctic Alumina.) And, so long as I had the heatsink assembly out anyway, I disassembled it for a thorough cleaning.

The Mayhem's CPU heatsink is basically a fan-pressurized plenum assembly with fine copper-fin heat exchangers at either end, connected via heat pipes to the CPU. The inside surfaces of both heat exchangers were totally blocked by a layer of dust so thick and dense that it had formed what amounted to a 1/8" layer of solid felt. NO WONDER the previous owner had reported that it had begun "crashing" at increasingly-short random intervals; it had no cooling airflow whatsoever, and was going into emergency thermal shutdown to avoid destroying its CPU.

With its heat exchanger cleaned and new thermal compound, the Mayhem is now stable running Windows XP Pro, and has been up for two days without a hiccup; I even stress-tested it playing Halo for about twenty minutes, which was driving it into thermal shutdown within about two minutes prior to the cleanup. (I will note, however: If you value your sanity, don't ever try to play Halo — or probably any FPS — with a trackpad. Just don't.). So, one problem licked; I know the hardware is sound. Next, I get to re-tackle the problem of getting it to boot under Linux with ACPI enabled. (Which may now be a solved problem — it is entirely possible that what was happening was it was getting as far as loading the ACPI code, detecting thermal overtemperature condition, and immediately halting.) With a 100GB hard disk (well... about 94 real GB), I have it partitioned 45GB and 45GB with a roughly 4GB swap partition, and I'll be leaving the XP installation in place and installing Gentoo dual-boot.

Given this success, I went on to also disassemble the M5309 laptop, which I'd been trying to reinstall for the Dread Pirate Bignum (who has decided to name it Post-Dated Check Loan) until it became unable to successfully compile anything, and replace its heatsink thermal compound as well. (Which turned out to also be a good call; it was that horrible grey waxy stuff, and very little of it was still between the heatsink and the die.) I then set out to try a new clean install, which not only exhibited no further problems with failed compilations, but successfully recompiled gcc on the very first try — always a good stresstest of a system. But it still wouldn't build glibc.

And this was where things REALLY got interesting.

( Beware; here be deep geekery. )

Now, with that all figured out, Post-Dated Check Loan is happily compiling away, installing a complete clean system from scratch. So Pirate is going to have her laptop after all in another day or two. (We should probably buy her a new battery at some point though; the one we got with the laptop is down to 29% of its original 4400mWh capacity.)

Then, assuming I can sort out the processor speed control issue on the Mayhem laptop, I just need to source a screen from it somewhere. (It requires a Quanta QD15LT01 15.4" WXGA active-matrix TFT LCD. I can find one for about $60 used and asserted to be in good condition, about $77 for a "100% compatible" knockoff with a high-glare gloss surface, or $95 for the genuine article. Saving $18 for a knockoff screen that I already know will have a glare problem just doesn't seem like a good idea. But whichever I buy, it's going to have to wait on budget.)

Current Location: Gilford, New Hampshire
Current Music: VNV Nation :: Darkangel :: Darkangel (azrael) (08:00)
Crossposts: http://unixronin.livejournal.com/774219.html

Tags:

geekdom,
hardware

MORE POWER, IGOR!!!!

Saturday, June 5th, 2010 04:19 pm

"Yeth, Mathter!"

As previously mentioned, I've been fighting a memory error problem for some time that at first looked like a bad RAM module. babylon5 has run perfectly stably for about eight years with 3x 256MB DDR333 RAM modules, but with the three 1GB DDR333 modules I was recently given, it developed severe stability issues, including gcc internal compiler errors (gcc works memory subsystems very hard, and is an excellent canary for memory problems), kernel "oopses", and even kernel panics. Any two of the modules would work together fine in any two slots, but add the third module, and things went to hell.

After scratching my head over this for some time, I did some calculations and came up with a working theory. Although in theory my existing 420W power supply should be adequate for the load, I came to suspect that (a) being a cheap, no-name power supply, it was not actually delivering its full rated 420W output, and (b) as a result, it was allowing the 3.3v rail to sag when the system was under heavy load and demanding peak power.

Well, I didn't have a spare power supply, so I bought one, a normally-$70 650W power supply from NewEgg on sale for $40 after rebates. I swapped the new power supply in last night, and after first making sure everything worked properly with 2GB RAM and the new supply, added the third RAM module back in. babylon5 is now most of the way through recompiling BRL-CAD (again ... I've been having trouble with it), and gcc hasn't so much as hiccupped, although system load factor has climbed into double digits with up to 13 gcc processes running at a time.

So, it looks as though my theory of insufficient power on the 3.3v rail was correct, and the problem is now solved.

(The other good news, of course, is that I now have a spare power supply. The bad news is that it appears to be putting out about 350W, about 17% below its rated output, at best...)

Current Location: Gilford, New Hampshire
Current Music: Pat Benatar :: Greatest Hits :: One Love (04:31)

Tags:

hardware

BAH!

Friday, June 4th, 2010 08:45 pm

About two years ago now, I bought a Hanns.G HG281DP 28" LCD monitor. It came with a three-year warranty.

About three or four months ago, it started developing an intermittent problem.

About four weeks ago, I got an RMA for it under warranty, and about three weeks ago I got a spare CRT monitor freed up, packed up the LCD carefully in its original packaging, complete with the screen protector sheet carefully taped in place with masking tape that releases cleanly without leaving residue, and shipped it off for repair.

It came back today, in a plaim brown carton, wrapped in about thirty yards of bubble pack. Original packaging missing. Screen protector sheet missing. The end of the bubble pack was taped to the face of the monitor with packing tape, which I had considerable difficulty removing and which left a wad of adhesive residue that it has taken twenty minutes of scrubbing with a shop towel and 99% isopropanol to remove.

The intermittent fault I sent it in for does indeed appear to have been fixed. It's been replaced with a fault that's present all the time. Everything is heavily tinted green. White is greenish. Grey is greenish. Yellow is yellow-green. Even black is green-tinted.

They've issued a new RMA, to include a prepaid shipping tag and an order to exchange for a tested replacement unit upon receipt ... but it'll be gone for another couple of weeks of using my spare monitor, during which it'll be hard to get anything done. Subsequently, I've found that I can get bright areas of the screen mostly adjusted to normal, but dark areas are still noticeably green-tinted. What I need to decide is whether this is acceptably correct. I hate to have my big monitor gone for another couple of weeks, but they did send it back to me not working properly, and they are supposedly covering the return shipping.

(Oh, speaking of adjsutment, the firmware on the new mainboard they installed does not offer the typical set of numeric color temperature settings. Instead, it offers vague "warm", "cool", and "nature" — whatever the hell that's supposed to mean — settings, plus "user".)

To be fair, Hannspree customer service was very apologetic about the tape, and explained that they don't keep the packaging monitors arrive back in because it's usually pretty beaten up by the time they receive it. Still, if they'd said that up front in the packing instructions — "DON'T use your original monitor packaging if you want to keep it, because we will not save it" — I'd have used a different box. Instead, they explicitly tell you to use your original monitor packaging, if you still have it. Then they throw it away.

I am, in general, not pleased.

Current Location: Gilford, New Hampshire
Current Music: ABBA :: Thank You For The Music :: Eagle (05:49)

Tags:

hardware

"Should have taken photos"

Monday, May 31st, 2010 02:10 pm

I just repaired the broken shifter lock button on our Volvo. The button was badly cracked when we bought the car (a used Volvo XC70 with 98,000 miles on it, for $7000), and broke into three pieces during the first winter we had the car. I pulled the pieces out, tried various different adhesives without success, and eventually welded the parts back together and reinstalled it. That lasted until last summer or fall, when it broke again along one of the original cracks. I took the pieces out again, put them on my workbench intending to try another repair, and there they got buried.

Recently, having come up with a plan I thought would work, I dug them out again and had another go. Yesterday, I broke off all the old weld filler plastic, re-welded the break and the other old crack from both sides to hold it together while I worked on it, sanded all the affected outer surfaces level and slightly rough, then heat-molded a sheet of textured black 3/16" ABS onto it to fit the compound curvature of the front face perfectly. I then mixed up a batch of two-part epoxy, bedded the button into the new ABS face and epoxied it there, cut a piece of glass mat to fit inside it, and epoxied that into place inside the button to make an internal fiberglass reinforcing layer. Then, after all of the epoxy cured, I trimmed and sanded the ABS faceplate to the edges of the button and buffed all the exposed edges.

I just went out and installed the repaired lock button (now easily three times its original mass) back into the shifter. (The dealership, by the way, won't even attempt to replace the button; they'll replace the entire shifter.) With the dubious benefit of unwanted practice, it only took me about ten minutes to get it in this time, using only a pair of needle-nose pliers and (believe it or not) a dental pick. It's working perfectly, and looks as though it's always been there.

Functionality restored, cosmetics restored, roughly $400 dealer parts-and-labor bill saved.

WIN.

Current Location: Gilford, New Hampshire
Current Music: Saga :: Behaviour :: Misbehaviour (04:03)

Tags:

hardware

"Normal wear"

Tuesday, April 6th, 2010 12:14 pm

I believe I've previously commented on issues with the Microsoft Natural Ergonomic Keyboard 4000. You can go look for the prior post if you want to; I'm not going to. Instead, I'll briefly summarize the issues:

The MS Natural 4000 is frequently not recognized at boot time as a keyboard, even by motherboards that are aware of USB keyboards. This causes "no keyboard connected" warnings, and can cause problems getting into the BIOS setup if you have to.
The keyboard will sometimes spontaneously begin auto-repeating a key you didn't even press. (n and t are common.) It can be quite difficult to get it to stop. Needless to say, the wrong key autorepeating at the wrong moment could be catastrophic.
There's the whole silly F-lock business. By itself, this would just be silly wasted functionality, but when the keyboard forgets its F-lock state, it defaults to the alternate key meanings.
And then there's the key cap life issue. A Natural 4000's key cap legends — which appear to be just printed decals, and exceptionally thin ones at that — begin to wear away within a few months of normal use, and by six months many of them will be completely blank.

My first Natural Keyboard 4000 was replaced under warranty by Microsoft at five months' age. (It carries a three year warranty.) Within four months of replacement, the replacement had the same problem. I'm guessing Microsoft replaced a LOT of Natural Keyboard 4000s with worn-blank key caps, and decided it was costing them too much money.

You see, they've now solved the problem with a brilliantly simple strategy. What's more, this solution automatically extends to all existing keyboards already sold! Astounding!

The solution?

They have simply defined the Natural Keyboard 4000's absurdly short key cap life to be "normal wear". Thus it is no longer covered under the warranty.

I suppose this is to be expected from Microsoft. It's not like they don't already have a history of making their shoddy products become their customer's problem. But this is more blatant than usual; they have basically just declared an endemic problem with one of their products to be not a problem, in order to avoid having to honor their warranty on the product.

Maybe, just maybe, Microsoft, before warranting this keyboard for three years, you should have done some actual wear testing on it to see whether or not it would actually stand up to three years of use. Or one. Or even six months. But now that you've put it out on the market with a three-year warranty, the least you could do is suck it up and honor your warranty. Or, god forbid, have your hardware OEM actually fix the problem.

But it's easier to just let your name become still more tarnished, isn't it?

Current Mood: disgusted
Current Location: Gilford, New Hampshire
Current Music: Yes :: YesStory :: Time And A Word (04:31)

Tags:

hardware

Habemus plus vis computatoris quam Deus

Further ramblings of a Unix ronin

Profile

Links

December 2012

Navigation

Page Summary

Most Popular Tags

Expand Cut Tags

LOMS Incident

Hardwarestat

Getting more eyes on the problem

Update:

Frist prost!

Case remanded to higher court, overturned on appeal

The continuing UPS saga

Arrays and RAIDZ and spares, oh my!

Keyboards: Do Not Buy (V7 Comfort Keyboard)

General Stuff Update

Trials, tribulations, and reinstalls

sigh Here we go again

Illuminating the elephant

Fun with SCSI

So, just for the record ...

The continuing saga ...

Hacking

MORE POWER, IGOR!!!!

BAH!

"Should have taken photos"

"Normal wear"

Style Credit