Profile

unixronin: Galen the technomage, from Babylon 5: Crusade (Default)
Unixronin

December 2012

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
3031     

Most Popular Tags

Expand Cut Tags

No cut tags
Thursday, May 8th, 2008 10:58 am

Mainly directed at folks such as [livejournal.com profile] robbat2 and [livejournal.com profile] zaitcev:

What would it take to add the ability into a *nix kernel to kill a process that's in uninterruptible sleep due to, say, a hardware fault?  Say I have an external device, and it hangs up, and I can reset the device, but the daemon that was controlling it is still in uninterruptible sleep.  I really should be able to tell the kernel, "Look, I don't care that the process isn't responding, I SAID KILL IT, RIGHT NOW", and have the kernel do it.  Never mind trying to play nice with the process, just forcibly terminate its execution and free all of its allocated memory, ports, sockets etc.  Sort of a kill -9 on steroids.  SIG_TERM, SIG_KILL, SIG_EXTREME_PREJUDICE.  If the process is non-responsive and in uninterruptible sleep, that's most likely the reason WHY I want it terminated, without having to rely on the target process doing anything in response to a signal.  (OK, for that reason it wouldn't really actually be a signal, it'd be more of a kernel-instigated purge-process facility.  And per the point raised by [livejournal.com profile] ithildae, it should probably require UID0 to call it.)

Would there be any downsides to doing so?  Reasons why the capability would be a bad idea?

Being able to do that could save a reboot, or at least let you postpone a reboot until you're done with whatever it was you were working on.  It's always seemed odd to me that even a SIGKILL requires at least minimal cooperation on the part of the process being killed.

Tags:
Thursday, May 8th, 2008 05:05 pm (UTC)
Without the control structures within the process, can the kernel determine what memory, process, ports and sockets are being used by the process? It has been a while, but I thought that the answer was, 'no'.

It is really kind of strange, but we can't really stop a process under *nix, all we can do is ask it to commit suicide, with varying levels of insistence.

With that capability, could you kill the scheduler? The memory manager? or some other, vital, kernel function? Would it be easy to violently bring the system to a sudden, crashing stop? I know that the *nix philosophy is to allow sysadmins enough rope to swing, (#rm -R .) but killing the kernel seems too easy.
Thursday, May 8th, 2008 05:17 pm (UTC)
Well, I suppose one could argue that if you make it require UID0, then if you have the privileges to use it, you have the privileges to kill the system in any of a hundred other ways anyway.
Thursday, May 8th, 2008 08:13 pm (UTC)
Yeah, enough rope to swing...
Thursday, May 8th, 2008 05:56 pm (UTC)
When UNIX does not permit you to kill a process, it is typically because that process has initiated DMA (http://en.wikipedia.org/wiki/Direct_memory_access) I/O, and a piece of RAM allocated to that process has been marked as either the source or the recipient of the I/O, ergo, that RAM must remain resident, and must remain part of that process (permissions, etc) until the I/O completes.

Failure to ensure this would be bad, e.g. the device completes the I/O ... but into a piece of RAM that belongs to another process. Did it smash something important? You might not find out until much later, if at all.
Thursday, May 8th, 2008 06:12 pm (UTC)
Yeah, I can see how that would be bad. However, in a case where the device locked up and you had to reset it, but the controlling process is still hung ...

I think it'd be a useful capability to have, with the provisos that (a) you should only use it if you're sure you know what you're doing, and (b) you're going to reboot the machine as soon as you safely can anyway, but it would really be a pain in the ass to have to do it right now. But let's face it, (a) applies to a LOT of things that you have to be root to do. A significant part of the fu of root is knowing when, how, and under what circumstances it can be made relatively safe to do those potentially highly dangerous things.
Thursday, May 8th, 2008 07:44 pm (UTC)
With that caveat, why not just let the process merrily sit there doing nothing until you get around to scheduling a reboot?
Thursday, May 8th, 2008 08:03 pm (UTC)
Well, you probably can, as long as it doesn't have some resource locked that you need access to in order to finish what you were doing before you can reboot. Say, for example, the device that you just had to reset.
Friday, May 9th, 2008 02:31 am (UTC)
How is this handled with hot-swappable devices?
Friday, May 9th, 2008 05:11 am (UTC)
By quiescing the device before you remove it. Fail to so do, and you're likely to lose.

In some cases, e.g. network devices, this is OK, because the software is prepared to handle error (networks lose packets all the time). However, there is no operating system in the world that likes having its filesystem (or pieces thereof) go away unexpectedly. Even networked filesystem disappearance tends to cause varying levels of heartburn.
Friday, May 9th, 2008 07:27 am (UTC)
What about software RAID and "cloud" filesystems?

(sorry if these questions are annoying ...)
Friday, May 9th, 2008 09:34 am (UTC)
I don't know what is meant by a "cloud filesystem" - I am unfamiliar with the term. It is likely to be a neologism for something I know, but without a definition I can't speak to it.

RAID (http://en.wikipedia.org/wiki/RAID), depending upon which level is used (e.g. RAID 1, 5), can give you a level of indirection between the filesystem and a failed disk, which means that the disk can likely be replaced while filesystem I/O is buffered somewhere else (and you have a UPS (http://en.wikipedia.org/wiki/Uninterruptible_power_supply) protecting everything), and the new disk can be "rebuilt" after it is in place and spun up. Of course, you have to tell the RAID subsystem that you're doing this.

However, that doesn't save you if the RAID subsystem itself seizes for some reason, or if you're using a RAID level that admits to catastrophic failure (e.g. RAID 0) if a single disk fails. That merely puts you back in the same place you'd be if you just had one disk (well, worse off actually, with multiple disks in a RAID0, you have MTBF (http://en.wikipedia.org/wiki/MTBF) issues to worry about).

In the end, I/O either has to complete, or abort, and hopefully that middle indeterminate state is as short as possible. If there is a failure, hopefully the subsystem you're doing I/O to will have the decency to tell you, and the software that's using it has some way to recover. However, a lot of software is written with, shall we say, restricted assumptions (e.g. "all disks are local and filesystem writes never fail"), and if one works in software long enough, one sees a lot of code that presumes an operation will always succeed, and thus doesn't check for the error condition.
Steinbach's Guideline for Systems Programming: Never test for an error condition you don't know how to handle.

In UNIX kernel hacking (excuse me, "systems programming"), this is known as a panic (http://en.wikipedia.org/wiki/Kernel_panic).

If you really want to see perplexed faces and heads exploding, watch what happens when a networked application using lots of RPC (http://en.wikipedia.org/wiki/Remote_procedure_call)s that was only tested in a LAN (http://en.wikipedia.org/wiki/LAN) gets deployed to a WAN (http://en.wikipedia.org/wiki/Wide_area_network), and the performance goes to shit. This is also why you really don't want to use NFS (http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29) in a WAN. Network latency is a real bitch.

Alas, the Speed of Light is not just a good idea, It's The Law.
Friday, May 9th, 2008 12:58 pm (UTC)
Thank you for your detailed reply!

It seems that you offer a few different lessons here based on failed assumptions: the subsystem has tell you something didn't work, the driver has to handle something not working, and the data must be recoverable if it keeps not working. And this is before considering performance issues.

As for cloud filesystems, they are completely virtual filesystems. A cloud service presents you with a single filesystem available over the network, and it takes care of atomicity and fail-over for you across a cluster of heterogenous storage hardware. GoogleFS (http://en.wikipedia.org/wiki/Googlefs) is supposed to be one such example.
Friday, May 9th, 2008 10:45 am (UTC)
Beyond "Better, we hope"? :)

I was actually discussing this out-of-band with someone who doesn't have a LJ account, and she pointed out that every well-written hardware driver should have a watchdog that times I/O operations and cleans them up if they fail, allowing a hung process to wake up from I/O wait and acknowledge (and, hopefully, gracefully handle) the failure. But the problem is that many drivers are not well-written and do not gracefully handle failed I/Os.
Friday, May 9th, 2008 12:30 pm (UTC)
sierra_nevada also mentions that the device might not tell you that something didn't work, so a timeout is a last-resort sanity check.
Thursday, May 8th, 2008 10:21 pm (UTC)
kill -IN_THE_FACE
Friday, May 16th, 2008 01:24 am (UTC)

If qemu matures to the point it can either fully emulate or has some sort of bridge to the real hardware AND it does it's own hardware handling properly. You can just close the window. ;)

It does work now but the emulated PC is ultra vanilla. I use it just to have a sandbox for windows.

It would be pretty cool to have a rack SMOS cardio-486's, one processor per task with the power, reset, etc controlled by another processor. I'd never heard of but they have some gathering dust at work. Credit card sized 486 w/16 megs, video, serial, and some sort of buss connection. They can boot dos with the right breakout. ;)

Yes I lust for it but there is almost ZERO information online about them and ours were just part of another system we sold which is now obsolete.

http://www.thefreelibrary.com/S-MOS+Systems+announces+'486+credit-card+sized+computer.-a015970434