Profile

unixronin: Galen the technomage, from Babylon 5: Crusade (Default)
Unixronin

December 2012

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
3031     

Most Popular Tags

Expand Cut Tags

No cut tags
Thursday, December 2nd, 2004 01:24 am

I appear to have just confirmed I have a kernel memory leak on this machine.

Before reboot:  496M of physical RAM out of 512M in use for processes, 300M of swap out of 512M used, total of around 850M of RAM usage and only 50M of it buffers and cache.  After reboot:  295M of physical RAM used for processes, 210M used for buffers and cache, zero swap used.  Same processes running, same number of browser windows open to the same pages, same four processes are the top memory users, they're still the only ones using over 1M of RAM, and they're using about the same amount as they were before rebooting.

So somewhere, the kernel had leaked half a gig of RAM....

Wednesday, December 1st, 2004 11:26 pm (UTC)
Ain't modern technology grand?
Thursday, December 2nd, 2004 03:58 pm (UTC)
It wasn't the kernel. If you still had browser windows open you've got plenty of other culprits.

The browser could be caching stuff you'd already visited. Your xterms could each have their own history. Lots of places for the processes to be different. And that's if you don't run some servers like apache, mysql, postgres, sendmail, postfix, ....

[Even if you already know some or all of this, other people might want the detail, so I'm including it. No offense intended.]

X or an X application was more likely the problem, so you could probably have fixed it by killing and restarting X (Ctrl-Alt-Backspace) instead of rebooting.

If you want to make a better before/after comparison you probably want to shut down as much as you can before collect your stats.

To demonstrate it wasn't the kernel, I'd recommend something like this:

Switch to a text console (Ctrl-Alt-F1), log in as root, switch to single user mode with `init 1`.

Collect some stats: `(free ; echo '---' ; lsmod ; echo '---' ; ps auxww) > /tmp/before`.

Reboot.

Repeat the procedure after the machine finishes rebooting:

Switch to a text console (Ctrl-Alt-F1), log in as root, switch to single user mode with `init 1`.

Collect some stats: `(free ; echo '---' ; lsmod ; echo '---' ; ps auxww) > /tmp/after`.

Compare before and after. They ought to be much closer than the numbers you showwed.

If there are kernel modules listed in before that aren't in after, unload those next time you want to do another comparison.
Thursday, December 2nd, 2004 04:51 pm (UTC)
Um, excuse me, but on what diagnostics from my machine are you basing your flat assertion that "It wasn't the kernel"? Give me credit for doing some checking and diagnostics -- I didn't start using *nix yesterday, you know. Allocated memory had been gradually and monotonically increasing over a period of a couple of months until the machine could't open an xterm without swapping, and was consuming swap at the rate of about 30MB a week. It was NOT a browser, it was NOT my X server, nor was it Apache, mysqld, Bacula, sshd, Samba, BIND, xdm, xfm, xft.... I'd previously already tried restarting all of these, one by one. Those are the OBVIOUS things to try first (not to mention that most of them were barely even on the map in terms of memory usage). You think I'd have rebooted the box if there was anything left to try except the kernel?

Sheesh.
Thursday, December 2nd, 2004 06:49 pm (UTC)
How about a library leak? That would seem a bit more likely, as library code is generally a lot less audited than the kernel.

One would hope there are not stupid bad kernel memory leaks everywhere. Stupid bad glibc leaks, that I will buy for a dollar. :-)
Thursday, December 2nd, 2004 07:32 pm (UTC)
Mmm, point, there is that possibility. I hadn't considered that. In that case the memory would be freed when the last application using that library is closed, right?

And yeah, one would hope there were not bad memory leaks all over, but it wouldn't be the first time the Linux kernel has had a memory leak.
Then again, I DID have to update some Gnome2 libraries a while back to support the latest Firefox releases, and I cannot honestly claim Gnome is my favorite piece of software. So you might well be onto something there.

Any diagnostic suggestions for library leaks?

(Frankly, this whole box is overdue for a total OS reinstall with a more current version. I suspect that'll clean up a LOT of assorted cruft.)
Friday, December 3rd, 2004 01:50 am (UTC)
Sorry if I pissed you off.

Your best bet to rule the kernel out is going to be dropping to single user mode.

You might even do a full lsof before/after and compare those if you want more detail.

It could be a kernel module, but even that isn't nearly as likely as having a problem in something else.
Friday, December 3rd, 2004 08:06 am (UTC)
Yeah, sorry I went off on you a bit. But frankly, I found it insulting that you assumed that I hadn't done even trivial troubleshooting of userspace apps as obvious as a web browser simply because I hadn't posted my full troubleshooting process.
Friday, December 3rd, 2004 02:07 am (UTC)
You think I'd have rebooted the box if there was anything left to try except the kernel?"

You said "same number of browser windows open to the same pages" which strongly suggests a lack of due diligence before claiming a kernel memory leak, because you were still running gods know how many user processes.

Now if you said you'd dropped to single user mode, unloaded all the kernel modules you could, unmounted all the partitions you could get away with, only had 6 processes running on the whole box, checked lsof, and you still had a problem, that would be different.

I'm betting you can get your memory back without rebooting, so yeah, I think you rebooted when there were things left to try.

I didn't start using *nix yesterday, you know.

Don't sweat it, we're all still learning, except those of us who choose to stop doing so.

These might be helpful for you:


Free Debugging Source Code and Libraries,
Memory Leak Detection, Resource Leak Detection, Heap Checkers
(http://www.thefreecountry.com/sourcecode/debugging.shtml)
Friday, December 3rd, 2004 08:19 am (UTC)
You said "same number of browser windows open to the same pages" which strongly suggests a lack of due diligence before claiming a kernel memory leak, because you were still running gods know how many user processes.

Yeah, that was poor wording on my part. I should probably have added "as were running before I started trying to troubleshoot the leak". I'd started all my userspace environment back up again to try and make as fair as possible a comparison.

You are in fact right, I hadn't exhausted all possibilities; although I'd shut down and restarted pretty much everything running outside the kernel, I hadn't done so to all of them SIMULTANEOUSLY by dropping to single-user. (I basically don't keep any non-essential kernel modules loaded except for my sound drivers, and I'd already ascertained that the sound subsystem is not the problem.) GP's suggestion that it was a library leaking memory hadn't occurred to me, and after looking a little further, this is now looking like a very real possibility. Furthermore, I suspect there's a high likelihood it's one of the Gnome2 libraries that I had to update a few weeks ago in order to support software dependencies for the latest Firefox releases.

This machine is, frankly, overdue for a totally clean reinstall of a current version of the OS. Last time I tried to do this, about a year ago, was with Slackware 9, which turned out to be so badly broken it wouldn't work. I suspect a clean new OS image will clean up a number of nagging problems.