[Twisted-Python] running 1,000,000 tasks, 40 at-a-time
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Wed Oct 26 08:24:12 MDT 2011
On 02:02 pm, jrennie at gmail.com wrote:
>The background:
>
>I've been using DeferredSemaphore and DeferredList to manage the
>running of
>tasks with a resource constraint (only so many tasks can run at the
>same
>time). This worked great until I tried to use it to manage millions of
>tasks. Simply setting them up to run (DeferredSemaphore.run() calls)
>took
>appx. 2 hours and used ~5 gigs of ram. This was less efficient than I
>expected. Note that these numbers don't include time/memory for
>actually
>running the tasks, only time/memory to set up the running of the tasks.
>I've since written a custom task runner that has uses comparatively
>little
>setup time/memory by adding a "manager" callback to each task which
>starts
>additional tasks as appropriate.
>
>My questions:
>
> - Is the behavior I'm seeing expected? i.e. are DS/DL only
>recommended
> for task management if the # of tasks not too large? Is there a
>better way
> to use DS/DL that I might not be thinking of?
Yes, it's expected. Queueing up millions of tasks is a lot of work.
Setting up millions more callbacks to learn about completion is a lot
more work. I would not recommend DeferredSemaphore for things beyond
"user scale" - eg, things that correspond to a single user action, like
clicking a button in a GUI.
> - Is there a Twisted pattern for managing tasks efficiently that I
>might
> be missing?
I think the generator/cooperator approach works pretty well, and has
constant (instead of linear) time completion notification and
distributes setup costs across the lifetime of the queue, probably
allowing for better resource utilization.
See http://as.ynchrono.us/2006/05/limiting-parallelism_22.html for a
simple write-up.
Jean-Paul
More information about the Twisted-Python
mailing list