[Twisted-Python] Re: Reentrant reactor iteration

Martin Geisler mg at daimi.au.dk
Sat Mar 7 14:58:01 MST 2009


Jean-Paul Calderone <exarkun at divmod.com> writes:

> On Sat, 07 Mar 2009 19:38:46 +0100, Martin Geisler <mg at daimi.au.dk> wrote:
>
>> We have overloaded the arithmetic operators in our library, so people
>> will expect to be able to write
>>
>>  # xs and ys are big lists of our objects
>>  dot_product
>>  for (x, y) in zip(xs, ys):
>>    dot_product += x * y
>>
>> Here the multiplications involves network traffic and return
>> Deferreds. We would like the network traffic for the first
>> multiplication to begin immediately, *before* the remaining
>> multiplications are done.
>>
>> Doing all the multiplications up front makes the code block the
>> reactor and uses an awful lot of RAM. If we let each multiplication
>> trigger the sending of its data immediately, and if we process
>> incoming messages along the way, memory can be reclaimed for the
>> earlier multiplications and the above calculation should run in
>> constant memory.
>
> Hm.  I would bet the constant would be pretty high, though.  Things will
> only balance out once the network is keeping up with the local for loop.
> Actually, I'm not sure why it would be constant at all.  Won't the local
> for loop always run much faster than any network operations can happen?

Yeah, you're right. Our results are very biased by only testing on a
local area network so far...

Three parties execute the same code, and each multiplication is done
when we have heard from the two others. With a fast network we can move
data around as fast as we can do the local operations needed (some
additions and multiplications of numbers with 65 bits or more).

> In this case, memory usage will go towards K(local) * set size -
> K(remote) * set size, or just O(set size); that is to say, linear.

Right, and that hints that the real goal is to throttle the
multiplications (like you point out below).

>>Sending and processing data in a more even flow makes our benchmark
>>results better and more consistent from one run to the next.
>
> It seems like what you'll really benefit from most is pipelining with a
> maximum pipeline depth that's not too big (so as to conserve memory) but
> not too small (so as to avoid wasting time on network round trip time).

Yes, that is exactly what we want!

>>Right -- we might be able to use these techniques. I haven't looked at
>>coiterate yet. With inlineCallbacks I guess the code would look
>>something like this:
>>
>>  # xs and ys are big lists of our objects
>>  dot_product
>>  for (x, y) in zip(xs, ys):
>>    dot_product += (yield x * y)
>>
>>which is not so bad, expect that it destroys the nice illusion that x
>>and y behave like normal integers even though the multiplication
>>involves network traffic.
>
> Perhaps with coiterate this might look something like...
>
>    def op(xs, ys):
>        dot_product = 0
>        for (x, y) in zip(xs, ys):
>            dot_product += x * y
>            yield
>
>        yield dot_product
>
>    d = coiterate(op(xs, ys))

Cool, thanks for the example! That really helps in understanding the
alternatives...

> This is flawed, but maybe it can be fixed. What's good about it is
> that it doesn't try to drive the reactor re-entrantly. :) It also
> avoids the yield in the += and * operations, which somewhat preserves
> your illusion of normalcy (I'm not saying I like that illusion, but
> I'll leave that aside for now).

Hehe :-) I'm also not sure what to think of the illusion, but it just
came very naturally with Twisted and the Deferreds. On the other hand it
breaks down all the time since one still needs to directly add callbacks
to do certain things...

> What's bad about it is that it still lets the local loop run arbitrarily
> far ahead of the results from the network, giving you unbounded memory
> usage.  To fix that, it should yield a Deferred every once in a while.
> The reason I leave it flawed here is that it's a little tricky to figure
> out which Deferred to yield.  What would be great would be to yield the
> Deferred for an operation which preceeds the most recently executed
> operation by some number of operations.  The number of operations by
> which it preceeds the most recent will be your pipeline depth (roughly).
> The effect this has on coiterate is to cause your local loop to cease to
> execute until that Deferred has fired.  By making it a Deferred for an
> operation some time /before/ the most recent operation, you keep the
> network busy and avoid wasting time on round trip times.  Once the Deferred
> fires, your loop gets run a few more times which has the effect of putting
> more work into the pipeline - until you've got enough extra work to keep
> things from stalling again, and then coiterate puts you to sleep for a
> while again.

Thanks a lot for the info! I think I'll forward it to the VIFF mailing
list and discuss a bit further there.

>>Are there other general problems with having a re-entrant reactor?
>
> One general problem is simply that the reactor is not written with this
> in mind.  So with the current implementation, even including the patch
> Marcel contributed, it doesn't work, period.  [...]

Hmm, lots of good reasons... :-/ Marcel also mentioned that his hack
made some unit tests fail, but I don't know yet if that was Twisted or
VIFF tests. Not that it matters much -- this change should of course be
invisible to existing code.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: </pipermail/twisted-python/attachments/20090307/1f6459cf/attachment.sig>


More information about the Twisted-Python mailing list