[Twisted-Python] Re: CPU intensive threads

David Bolen db3l at fitlinxx.com
Wed Jul 27 09:05:43 MDT 2005


Nathaniel Haggard <natester at gmail.com> writes:

> Is there a way to set the priority of the main part of twisted so that
> it can run CPU intensive threads and still service connections.

Although you could try using OS-dependent methods for boosting the
priority of the main thread (or lowering the background threads) if
you're truly CPU bound in pure Python code, it probably won't help
much, since even if it gets preferential control, the main twisted
loop calls out to I/O operations so much it would just be releasing it
back pretty fast.  

While Python does use native threads, due to its GIL (global
interpreter lock), if you have a thread that is purely CPU bound in
Python code generally the only way other threads get time is during a
periodic byte-code interval.  (You can find many more discussions
about the GIL and its implications in the comp.lang.python archives)

As mentioned in another response, the simplest way to help force
context switches in a tight CPU thread is by performing some operation
that releases the GIL (a simple one is something like time.sleep(0)),
so if your routine is such that it has a tight loop or some repetitive
code path, putting something like that in there might help quite a
bit.

If not, you might try fiddling with sys.setcheckinterval(), which
represents the number of byte-codes before Python forces the potential
for a thread switch (explicitly releases/grabs the GIL).  It was
bumped higher in recent Python releases - I think it's 100 now - so
you could try dropping it down to 10 or something.  Doing so will
probably cause your process to burn slightly more cpu/time overall,
but it should permit individual threads to remain more responsive in
the presence of CPU-bound threads.

If that still doesn't give you enough resilience, another option would
probably be to offload the processing to a separate process,
maintaining a simple link to the other process and communicating
requests over there for processing.  This supposes that transmitting
requests and results won't be a terribly high burden in time/space.
Since you're already using PB for other stuff, having an internal
(same host) PB server with appropriate processing objects would
probably work really well.  For your own interprocess communication
you could also decide to do a simpler RPC protocol that just pickled
the information to minimize the performance/transport impact.  Such a
setup would also give you (mostly for free) the flexibility to scale
the processing to multiple hosts should the need arise, as well as
being more friendly to SMP systems (which Python's GIL also interferes
with).

If you can't afford the time to transfer requests and/or results to a
separate process over a normal channel, and are on a Posix platform,
you might also investigate POSH (http://poshmodule.sourceforge.net),
which implements object sharing in shared memory between processes,
which would eliminate the transport overhead but still let you
separate the processing into distinct processes.  It's an early
development project that I've only experimented with but if it fits
your needs it might do well.  (It comes with a simple
producer/consumer example that could probably be used as a starting
point)

-- David





More information about the Twisted-Python mailing list