Recently I was given 3GB of DDR333 memory for babylon5, my Linux workstation, which has been getting by with 768MB for the past eight years or so. There's no disputing it was much zippier with the new memory, except that ... well ... stability issues appeared. The machine started locking up or crashing periodically for no apparent reason. Compiles started randomly failing. I rebooted the machine and ran memtest86+ on it for half a day, but mermtest86+ didn't find anything wrong. Then I found that I couldn't recompile BRLCAD, which had stopped working — a major annoyance, because I'd been intending to use it to create a 3D model for the planned rebuild of my desk. (It's a really nice steel-framed desk that would have been around $1200 new if I'd paid full price for it — which I didn't; I paid $200 — but the current top, made of several pieces of 1" particle board that don't actually fit together all that well, has been gradually deteriorating over the fifteen years I've owned it and is starting to get into fairly bad shape.) Then I tried to update gcc from 4.3.4 to 4.4.3, to see if it would compile BRLCAD, and discovered that I couldn't bootstrap gcc. The build kept dying with internal compiler errors.
Now, gcc has always been a sensitive indicator of memory problems. It's one of the best memory subsystem stress-testers you'll ever find, because it really bangs hard on the memory. It won't specifically locate problems, but it'll show up the existence of memory problems that normal diagnostics won't find because they don't stress the memory subsystem enough.
So I shut down babylon5, pulled all the memory, and examined it closely. I made the interesting discovery that the three modules are not precisely identical. All three are Crucial 1GB DDR333 DIMMs, but two are part number CT12864Z335.Y16TY, while the third is part number CT12864Z335.K16TY.
I don't know (and haven't been able to find out) what the difference is, but acting on a hunch, I put only the matching two modules back in and fired back up. Today, gcc-4.4.3 bootstrapped on the first try without a hitch. Shortly after, BRLCAD recompiled on the first try using the newly-built gcc-4.4.3. So now, I guess I get to contact Crucial and see if I can get the third memory module replaced under its lifetime warranty.
Unfortunately, BRLCAD still won't start.
(I have at least figured out that the problem appears to be DRI-related, though. BRLCAD is dying with the error "mged: ../common/texmem.c:936: get_max_size: Assertion `log2_size != 0' failed", if that's familiar to anyone. driconf also dies on startup, with the same error, implying it's something that both BRLCAD and driconf call — possibly something in MesaGL. Any suggestions for fixing it would be welcomed.)
(Update: I've traced it as far as glxinfo/glxgears. It looks to be an OpenGL/dri problem with x11-drivers/xf86-video-mga. I really need to get a newer video card into this machine, something with a more current — and better supported — chipset and DVI or HDMI out.)
no subject
I don't think its the same as the symptoms seem totally different but it is certainly in the same area so it could possibly be related.