[Twisted-Python] Flow, generators, coroutines etc.
Phil Mayers
p.mayers at imperial.ac.uk
Sun May 28 06:39:54 MDT 2006
Dominic Fox wrote:
> I understand that twisted.flow is no longer maintained, and is not
> widely considered to represent a good way of writing twisted code.
> However, I haven't been able to find any explanation of why this
> approach (using generators to simulate co-operative multitasking)
> seems to have been abandoned.
Well, flow in particular had a very funny model. It did not seem
straightforward to me at all.
>
> Is it simply the case that most people writing twisted code didn't
> find it very useful? Or are there more specific arguments against
> doing things that way?
I believe the general consensus with regards generator-based
microthreads is that attempting to hide the fact you are doing
asynchronous work behind a language trick is a Bad Thing(tm) in much the
same way that RESTians believe layering RPC semantics over HTTP (or any
WAN technology) is a bad thing, and for many of the same reasons.
In addition, though the concurrency issues are VASTLY reduced compared
to pre-emptive threading, they do still exist. I've got caught out by them.
Pre python2.5 getting the data back into the generator either requires a
global or a magic stack-traversing function which is of course nasty.
Finally, you have to work very hard pre-python2.5 to make the uthreads
correctly handle all error cases.
>
> I'm trying to make the case at work for using twisted for networking
> things (in spite of my preference for lightweight threads plus
> sensible concurrency primitives, if Python is the target platform then
> twisted is probably the best way to go).
I tend to think that lightweight threads really require first-class
support from the VM, such as exists in Erlang, in order for them to be
truly useful.
Were Erlang not such an ugly language I'd seriously consider switching
to it. The more restricted model leads to all kinds of magic VM scaling
goodness and the "right" (in my at least current opinion) way of writing
such code is enshrined in the very architecture.
Had Stackless not been shot down (and for no particularly good reasons -
"we'd have to port it to each platform" seems to have been the gist of
it, plus some people rather disappointing and recurrent fear of the new)
then I suspect Twisted would not exist in its current form.
w.r.t. "sensible" concurrency primitives, I've heard an Erlang expert
relate: "Oh, we have two concurrency primitives. 'read' and 'write'", a
position I support wholeheartedly.
All that said, the BBC Kamaelia project uses a generator-based
consumer/producer pipeline as its underlying primitive, and seems to get
along fine with it. But a consumer/producer component is not the same as
a lightweight thread of course - broadly a c/p will only ever interact
with its input, local variables, output and library calls. A thread
might be expected to interact with other threads and shared data, from
whence all difficulty springs.
It's also worth pointing out that Google uses a massively distributed
c/p implementation called map-reduce to do much of their big work. A
well-isolated generator-based c/p would be trivially parallel in much
the same way.
>
> If I can show some full-threaded code next to some co-operative
> multitasking code that a) has much the same sort of control flow, but
> b) scales much better, and doesn't have to worry about subtle
> concurrency issues, then I think it should go fairly well. If I have
> to explain about how the event-driven programming model works at the
> same time, it might not go *so* well...
I wrote a generator-based uthread thing over the top of twisted ages
ago, on more or less the same rationale - other people would eventually
be expected to write code for the system, and they would balk at or be
unable to handle writing "true" async code.
This has not in fact been the case. Several of my colleagues have picked
up the deferred/callback programming model with little difficulty.
Given that a similar system appears in MochiKit and web programmers
appear to be able to pick it up and run with it in a *JavaScript* VM,
perhaps we're underestimating people.
I will say three things:
1. The name "Deferred" is tremendously unfortunate. I cannot begin to
imagine why they weren't just called "Callback". The name seems to
confuse people into thinking it does something it does not.
2. The generator/uthread trick makes for tidier code because you can
keep state in local variables. Using callbacks require you to pass a
state object around and then prefix everything with a "state.varname" or
worse "state['varname']" and the (frankly annoying) extra typing
obviously leads to more scope for bugs as well as lower performance
since local variable access is much faster.
Sadly the "with" keyword just got used for something else (something
else the language already had in fact. Oh well)
3. I believe Deferreds as they currently exist are not very fast, and
that's on top of the high cost of python function calls. Frequently we
are told most processes are IO bound. That is very definitely not the
case in my setup - I am SNMP polling 1200 devices every 5 minutes,
sending an average of ~300-500 PDUs to each. With a bit of tuning, the
CPU spins at about 90% usermode and 10% io/system.
More information about the Twisted-Python
mailing list