[Twisted-Python] Flow, generators, coroutines etc.

Sun May 28 06:39:54 MDT 2006

Dominic Fox wrote:
> I understand that twisted.flow is no longer maintained, and is not
> widely considered to represent a good way of writing twisted code.
> However, I haven't been able to find any explanation of why this
> approach (using generators to simulate co-operative multitasking)
> seems to have been abandoned.

Well, flow in particular had a very funny model. It did not seem 
straightforward to me at all.

> 
> Is it simply the case  that most people writing twisted code didn't
> find it very useful? Or are there more specific arguments against
> doing things that way?

I believe the general consensus with regards generator-based 
microthreads is that attempting to hide the fact you are doing 
asynchronous work behind a language trick is a Bad Thing(tm) in much the 
same way that RESTians believe layering RPC semantics over HTTP (or any 
WAN technology) is a bad thing, and for many of the same reasons.

In addition, though the concurrency issues are VASTLY reduced compared 
to pre-emptive threading, they do still exist. I've got caught out by them.

Pre python2.5 getting the data back into the generator either requires a 
global or a magic stack-traversing function which is of course nasty.

Finally, you have to work very hard pre-python2.5 to make the uthreads 
correctly handle all error cases.

> 
> I'm trying to make the case at work for using twisted for networking
> things (in spite of my preference for lightweight threads plus
> sensible concurrency primitives, if Python is the target platform then
> twisted is probably the best way to go).

I tend to think that lightweight threads really require first-class 
support from the VM, such as exists in Erlang, in order for them to be 
truly useful.

Were Erlang not such an ugly language I'd seriously consider switching 
to it. The more restricted model leads to all kinds of magic VM scaling 
goodness and the "right" (in my at least current opinion) way of writing 
such code is enshrined in the very architecture.

Had Stackless not been shot down (and for no particularly good reasons - 
"we'd have to port it to each platform" seems to have been the gist of 
it, plus some people rather disappointing and recurrent fear of the new) 
then I suspect Twisted would not exist in its current form.

w.r.t. "sensible" concurrency primitives, I've heard an Erlang expert 
relate: "Oh, we have two concurrency primitives. 'read' and 'write'", a 
position I support wholeheartedly.

All that said, the BBC Kamaelia project uses a generator-based 
consumer/producer pipeline as its underlying primitive, and seems to get 
along fine with it. But a consumer/producer component is not the same as 
a lightweight thread of course - broadly a c/p will only ever interact 
with its input, local variables, output and library calls. A thread 
might be expected to interact with other threads and shared data, from 
whence all difficulty springs.

It's also worth pointing out that Google uses a massively distributed 
c/p implementation called map-reduce to do much of their big work. A 
well-isolated generator-based c/p would be trivially parallel in much 
the same way.

> 
> If I can show some full-threaded code next to some co-operative
> multitasking code that a) has much the same sort of control flow, but
> b) scales much better, and doesn't have to worry about subtle
> concurrency issues, then I think it should go fairly well. If I have
> to explain about how the event-driven programming model works at the
> same time, it might not go *so* well...

I wrote a generator-based uthread thing over the top of twisted ages 
ago, on more or less the same rationale - other people would eventually 
be expected to write code for the system, and they would balk at or be 
unable to handle writing "true" async code.

This has not in fact been the case. Several of my colleagues have picked 
up the deferred/callback programming model with little difficulty.

Given that a similar system appears in MochiKit and web programmers 
appear to be able to pick it up and run with it in a *JavaScript* VM, 
perhaps we're underestimating people.

I will say three things:

  1. The name "Deferred" is tremendously unfortunate. I cannot begin to 
imagine why they weren't just called "Callback". The name seems to 
confuse people into thinking it does something it does not.

  2. The generator/uthread trick makes for tidier code because you can 
keep state in local variables. Using callbacks require you to pass a 
state object around and then prefix everything with a "state.varname" or 
worse "state['varname']" and the (frankly annoying) extra typing 
obviously leads to more scope for bugs as well as lower performance 
since local variable access is much faster.
  Sadly the "with" keyword just got used for something else (something 
else the language already had in fact. Oh well)

  3. I believe Deferreds as they currently exist are not very fast, and 
that's on top of the high cost of python function calls. Frequently we 
are told most processes are IO bound. That is very definitely not the 
case in my setup - I am SNMP polling 1200 devices every 5 minutes, 
sending an average of ~300-500 PDUs to each. With a bit of tuning, the 
CPU spins at about 90% usermode and 10% io/system.