[Twisted-Python] linux kernel 2.6.11-rc broke twisted process pipes

Andrea Arcangeli andrea at cpushare.com
Mon Feb 28 17:37:23 MST 2005


BTW, I'm sending as andrea at cpushare.com but this email is really meant
to be from andrea at suse.de, the lists are set so that I can't post unless
I'm subscribed and to avoid unnecessary email load, I'm avoiding to
subscribe two of my accounts to the same list.

On Mon, Feb 28, 2005 at 05:52:05PM -0500, James Y Knight wrote:
> I agree this code is somewhat suboptimal. However, I do not agree that 
> it works only by luck.

The thing is that readable and writeable (in select(2) terms) could be
returned from linux 2.6.9 and all previous kernels immediatly without
sleeping, even if there was no disconnect event. You only needed the
write buffer empty for it to return POLLIN|POLLOUT without sleeping.
That's what I mean with "by luck". It wasn't twisted mistake but a linux
mistake apparently.

That check of "r and w" definitely could be true even if the other side
of the pipe didn't disconnect in past linux.

> In response to your main question of "why is it checking for 
> readability at all", there is a good answer: to get the disconnect 
> event without trying to write data. Select doesn't have a "err" fdset, 

Ok, btw even linux always returns the fd in both readable and writeable
when the other side of the pipe disconnects. This because we raise a
POLLERR and this is the mask to check when a fd is readable/writeable

#define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR)
#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)

So when POLLERR is raised, the fd is returned readable and writeable
from select.

Current kernel code is this:

static unsigned int
pipe_poll(struct file *filp, poll_table *wait)
{
        unsigned int mask;
        struct inode *inode = filp->f_dentry->d_inode;
        struct pipe_inode_info *info = inode->i_pipe;
        int nrbufs;

        poll_wait(filp, PIPE_WAIT(*inode), wait);

        /* Reading only -- no need for acquiring the semaphore.  */
        nrbufs = info->nrbufs;
        mask = 0;
        if (filp->f_mode & FMODE_READ) {
                mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
                if (!PIPE_WRITERS(*inode) && filp->f_version != PIPE_WCOUNTER(*inode))
                        mask |= POLLHUP;
        }

        if (filp->f_mode & FMODE_WRITE) {
                mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
                if (!PIPE_READERS(*inode))
                        mask |= POLLERR;
        }

        return mask;
}

With this new kernel code, a write pipe will _never_ return readable,
unless it's disconnected, and it will match the the current twisted code
as far as I can tell.

As you can see we never set POLLOUT for a write-pipe (opened in WRONLY
mode), and in turn the only way to end up in the POLLIN_SET is that
POLLERR is set.

> so you have to select for either readability or writability. You can't 
> select for writability when you have no data to write, or else you'll 
> be continuously waking up. On all UNIX OSes that I know of, write pipes 
> show up as readable in select when the other side has closed, for just 
> this reason. I don't know if it's documented in any specs, but as far 
> as I can tell, it's universally true.
> Breaking this would be a Bad Thing, for I suspect more apps than just 
> Twisted.

Linux never returned any information about the status of the other side
of the pipe via the readable information, because readable was always
returned true, since POLLIN was set unconditionally by the pipe_poll
code. So we're not going to break anything in that sense, it was already
broken and infact we're fixing it right now ;).

POLLIN was providing absolutely zero information because it was always
set unconditionally for both readers and writers. That was wrong and
that's why I changed it and this seem to make the doRead code valid for
the first time in linux.

> Of course, if it were that simple, doRead would just be implemented as 
> "return CONNECTION_LOST". From my testing, that'd even work on BSDs. 
> However, linux adds its own little wrinkle to the problem.
> On linux, the observed behavior seems to be that only one write can be 
> outstanding at a time -- if there is data in the buffer, writable will 
> be false and readable will be true. Otherwise, the inverse. This is 

It's never the inverse since pollin was always forced to be set, both
for readers and writers, so a write fd was always returned as readable
unconditionally. 

But you're perfectly right that if there is data in the buffer
writeable was false and readable was true. (readable provides no info in
linux)

> quite silly...but it's what happens. As far as I can tell, at no time 
> are both true, except when the pipe is disconnected. On BSD, a write 
> pipe is simply never readable until the reader is closed, which is a 
> lot more sensible.
> Also note, that bit of code is unnecessary if you're using pollreactor 
> rather than selectreactor. That is, I think it should be fine with 

This is great news. However note that the 2.6.11-rc5 official kernel
broke my app with the pollreactor too. So something else must have been
broken for the pollreactor too making the same assumption that doRead
did, and it's working fine again with my pipe patch infact.

> twisted if the pipe is just POLLHUP when it is disconnected rather than 
> POLLHUP|POLLIN|POLLOUT.

It's POLLERR not POLLHUP. POLLHUP is set only when the "writer" side
disconnected and you're listening to a reader fd. POLLERR is instead
returned when the "reader" disconnected and you're listening to a
writer fd.

In select terms it means when the reader disconnects the fd will be both
readable and writeable. And when the writer disconnects the fd will be
reported only as readable.

In poll terms it means with linux _only_ POLLERR will be set when the
read disconnects, and POLLIN will never ever be set on a WRONLY pipe.

So with the new fixes it seems linux does the right thing, and we at
leats mimic the select behaviour that twisted expects, I hope we mimic
the poll behaviour that twisted expects too.

I found the openbsd implementation of pipe_poll here:

	http://fxr.watson.org/fxr/source/kern/sys_pipe.c?v=OPENBSD#L661

671         if (events & (POLLIN | POLLRDNORM)) {
672                 if ((rpipe->pipe_buffer.cnt > 0) ||
673                     (rpipe->pipe_state & PIPE_EOF))
674                         revents |= events & (POLLIN | POLLRDNORM);
675         }

The way I read it, even openbsd will report the fd as readable
regardless if the channel is disconnected, if the buffer has something
into it.

Anyway linux won't do that anymore, a write fd will be now reported as
readable only if the reader has disconnected, and so you're safe to
assume the reader disconnected if selects returns readable && writeable.
So I think we should be fine now and no changes are required in twisted
at least for the select reactor side.

Please double check the pollreactor side too, we'll never return POLLIN
anymore for a WRONLY pipe fd.

I'm going to update a semi-productive system running twisted servers
using processes too, with the new pipe_poll code too to see what happens
(that's the good thing of not being fully productive yet, so I can
experiment a bit more ;).




More information about the Twisted-Python mailing list