[Twisted-Python] Twisted's ability to handle many open connections.
Matt Goodall
matt at pollenation.net
Wed Apr 26 19:18:44 EDT 2006
Terry Jones wrote:
> I'm completely new to Twisted. As I understand it, a Twisted server runs as
> a single process - no forking, no threads.
Correct. Twisted is event driven. It doesn't need to fork and spawn
threads to handle a lot of network connections at once.
However, Twisted is perfectly capable of managing any processes and
threads an application *needs* to create. In fact, Twisted's core
includes APIs specifically created to handle background processes and
threads.
> To me that implies that select
> (or poll or pselect, etc) is being heavily used to feed data to Deferreds.
select/poll/whatever is at the core of Twisted's event loop, known as
the 'reactor' but Deferreds are completely unrelated to data delivery.
A Deferred is a callback mechanism, nothing more. A function typically
returns a Deferred to say, "hey, you asked me to do something and I will
give my response ... just not right now".
>
> If that's the case, how does Twisted (portably) allow for a large number of
> file descriptors?
As you recognised, Twisted uses a select loop so I'm not quite sure what
your question means. However ...
Twisted chooses the best available reactor implementation for the
platform. "Best", here, probably meaning most available (default?)
rather than fastest.
I'm no expert here but on Linux the reactor really does use a select()
loop, on win32 it uses the win32 equivalent of a select loop
(WSAEventSelect and MsgWaitForMultipleObjects, or something), and I
believe it uses poll on the BSDs.
If the "best" reactor is not actually the best (if you see what I mean!)
then you can manually install a non-default reactor. There are plenty to
choose from.
For instance, on Linux if you want to handle a huge number of
simultaneous connections then installing a poll-based reactor is
probably a good idea. If your application uses GTK+ then you would
install a GTK+ reactor.
> I'm considering using Twisted for a project. Twisted would front requests
> from the web, but also from (e.g.) remote command line applications via an
> API.
This is what Twisted excels at - multi-protocol network applications.
I've done it many times now and it just works.
Twisted already includes support (of varying quality) for most common
protocols but it's very easy to add new protocols if necessary.
> These latter connections could potentially be numerous, and long
> lived. Should I be thinking about how Twisted deals with this, or does it
> already scale to allow more simultaneous connections / file descriptors
> than a naive server program would be able to maintain?
Twisted is event driven and most people believe that event driven scales
better, especially for network-related work. I therefore believe Twisted
will *help* you write code that will scale to a high number of
simultaneous connections. Twisted does perform any magic though.
You will need to understand event driven programming to scale up,
otherwise you're likely to block the application from doing more than
one thing at a time (in the cooperatively multitasking sense) making
your application utterly unscalable.
Also, note that Twisted does not magic away other common problems
associated with scalability. The obvious one that springs to mind
(because it bit me recently :-/) is the maximum number of file
descriptors a non-root Linux process can open per process.
>
> Given what looks like widespread use of Deferreds in Twisted, I suppose
> that there is not a 1-1 relationship between connections and file
> descriptors? If so this makes the issue more pressing.
There is absolutely a 1-1 relationship between connections and file
descriptors but Deferred have nothing to do with it.
It's quite possible to write a Twisted application without ever using a
Deferred. If your application can respond immediately (well, very
quickly) to requests that arrive from the network then you don't need
Deferreds.
You need a Deferred when you cannot respond immediately. In that case
you would create a Deferred and store it somewhere, spawn something
*non-blocking* to handle the request and return the Deferred to the
caller. Then, when the non-blocking thing completes, you callback via
the Deferred you stored to let the caller know the result.
>
> Thanks for any help,
> Terry
Hope it did help!
Cheers, Matt
--
__
/ \__ Matt Goodall, Pollenation Internet Ltd
\__/ \ w: http://www.pollenation.net
__/ \__/ e: matt at pollenation.net
/ \__/ \ t: +44 (0)113 2252500
\__/ \__/
/ \ Any views expressed are my own and do not necessarily
\__/ reflect the views of my employer.
More information about the Twisted-Python
mailing list