[Twisted-Python] Twisted's ability to handle many open connections.
glyph at divmod.com
glyph at divmod.com
Wed Apr 26 19:02:12 EDT 2006
On Wed, 26 Apr 2006 23:13:22 +0200, Terry Jones <tcj25 at cam.ac.uk> wrote:
>I'm completely new to Twisted. As I understand it, a Twisted server runs as
>a single process - no forking, no threads. To me that implies that select
>(or poll or pselect, etc) is being heavily used to feed data to Deferreds.
That's more or less accurate. It sounds like you understand this, but because I can never repeat it enough: Deferreds don't do anything magic or get involved with select() directly, it's just that their callback() methods are called from other callbacks which are ultimately invoked as a result of the return value of select().
>If that's the case, how does Twisted (portably) allow for a large number of
>file descriptors?
Twisted doesn't do *anything* portably :).
On Windows, there is a reactor that supports a maximum of 63 concurrent events (assuming your process has no windows open). On UNIX the default reactor is still select() based and will definitely bog down if you have too many simultaneous sockets open. Sometimes it is handled quite badly!
However, the selection of a reactor is a configuration option, not an application-architecture issue (unless, of course, you are writing a GUI which has requirements for what event loop it will run on).
You can choose a reactor with the -r argument to twistd, trial, and other similar tools like axiomatic. You can choose different reactors for different platforms. Application code should be none the wiser.
>I'm considering using Twisted for a project. Twisted would front requests
>from the web, but also from (e.g.) remote command line applications via an
>API. These latter connections could potentially be numerous, and long
>lived. Should I be thinking about how Twisted deals with this, or does it
>already scale to allow more simultaneous connections / file descriptors
>than a naive server program would be able to maintain?
There are some pretty darn naive programs out there. The default naive case of a server that I'm aware of accept()s one connection, reads (blocking) the entire socket, then writes a response. That is 1 simultaneous connection: so in that case, yes, Twisted can do better :). However, it doesn't use any crazy tricks, and is likely to scale far worse than something like ACE or Sandstorm, i.e. any research project designed specifically just to scale. Scale is only one of Twisted's many considerations, as it aims to be a completely generic networking application layer.
The most scalable reactor you are likely to find on UNIX is the "poll" reactor. There is a kqueue reactor but it is by all accounts terrible (ie, performs worse than poll due to unfortunate design decisions in the python/kqueue bindings) and needs to be rewritten. In every case I'm aware of, other scalability concerns (usually the speed of Python itself) was an issue long before the IO overhead of simultaneous connections broke poll(). If your application has such a case, though, implementing an epoll, aio, kqueue, libevent, or <insert your totally non-portable, OS-specific, crazily scalable IO multiplexing API here> reactor should not be an inordinate amount of work. Less work, at least, than building such a beast and then writing what amounts to 2/3 of the Twisted core around it anyway :).
>Given what looks like widespread use of Deferreds in Twisted, I suppose
>that there is not a 1-1 relationship between connections and file
>descriptors? If so this makes the issue more pressing.
Again, Deferreds do not create file descriptors or have anything to do with select(). They are simply a Python object abstraction for callbacks.
Depends what you mean by 'connections'. listenTCP, listenUNIX, etc, create a listening socket. They accept() on that, and each buildProtocol is invoked after a connected socket is created. Each of those is a file descriptor. If you keep disk files or pipes open outside of Twisted, those are also file descriptors, etc etc etc. You can have virtual "connections" which multiplex multiple Protocol objects over a single transport though, or otherwise have fewer file descriptors than "connections" in that sense.
More information about the Twisted-Python
mailing list