[Twisted-Python] Re: Reentrant reactor iteration

Martin Geisler mg at daimi.au.dk
Sat Mar 7 11:38:46 MST 2009


Jean-Paul Calderone <exarkun at divmod.com> writes:

Hi,

Thanks for the answer. I'm also with the VIFF project and I would like
to explain a bit more about the background for the hack by Marcel.

> On Fri, 27 Feb 2009 15:26:43 +0100, Marcel Keller <mkeller at cs.au.dk> wrote:
>>Hi,
>>
>> I am working on the VIFF project (viff.dk) which uses Twisted. I
>> found out that our code is sometimes inefficient because we are
>> generating many deferreds (maybe about 10000) in a callback. While
>> doing that, no network communication is performed. Therefore, I
>> investigated the possibility of adding a function to the reactor
>> which is called every iteration and from which the iteration could
>> be called safely. Then, we could generate all deferreds in that
>> function and activate the reactor from to time. See the attached
>> patch for details.
>
> So you're doing a ton of work all at once now and you want to split up
> that ton of work into smaller pieces and do it a little at a time?

Sort of. We have overloaded the arithmetic operators in our library, so
people will expect to be able to write

  # xs and ys are big lists of our objects
  dot_product
  for (x, y) in zip(xs, ys):
    dot_product += x * y

Here the multiplications involves network traffic and return Deferreds.
We would like the network traffic for the first multiplication to begin
immediately, *before* the remaining multiplications are done.

Doing all the multiplications up front makes the code block the reactor
and uses an awful lot of RAM. If we let each multiplication trigger the
sending of its data immediately, and if we process incoming messages
along the way, memory can be reclaimed for the earlier multiplications
and the above calculation should run in constant memory.

Sending and processing data in a more even flow makes our benchmark
results better and more consistent from one run to the next.

> If that's the case, then you don't need to modify the reactor, you
> just need to split up the work your code is going. There are a lot of
> techniques for doing this. coiterate and inlineCallbacks are two
> solutions which are closest to "cookie cutter" (ie, you have the least
> flexibility in deciding how to use them).

Right -- we might be able to use these techniques. I haven't looked at
coiterate yet. With inlineCallbacks I guess the code would look
something like this:

  # xs and ys are big lists of our objects
  dot_product
  for (x, y) in zip(xs, ys):
    dot_product += (yield x * y)

which is not so bad, expect that it destroys the nice illusion that x
and y behave like normal integers even though the multiplication
involves network traffic.

> You have a very long, steep, uphill battle to convince me that adding
> support for re-entrant iteration is a good idea.

One problem I can think of is the memory usage associated with a very
deep recursion. Since there is no such thing as tail call optimization
in Python, each level in the recursion will hold onto any local
variables even though they might not be needed any more.

Are there other general problems with having a re-entrant reactor?

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: </pipermail/twisted-python/attachments/20090307/b0f23125/attachment.sig>


More information about the Twisted-Python mailing list