[Twisted-Python] When would you considering split a server application to some physical instance with different logic function?

Mon Dec 1 16:22:13 MST 2008

On Monday 01 December 2008, Peter Cai wrote:

> As far as I know, some networking device manufactures use this model to
> implement their routers or switches.
> but I have never heard any examples besides that.

The Postfix mail server uses an architecture of many small processes working 
together:
  http://www.postfix.org/OVERVIEW.html

One of the big motivations for this architecture is security: each process 
only needs to run with the minimum privileges it needs to do its work. Just 
because SMTP has to listen to port 25 does not mean the entire mail server 
has to run as root.

> As twisted book says, most networking application use one of these 3
> modes:
>
> 1. handle each connection in a separate operating system process, in
> which case the operating system will take care
> of letting other processes run while one is waiting;
>
> 2. handle each connection in a separate thread1 in which the threading
> framework takes care of letting other threads
> run while one is waiting; or
>
> 3. use non-blocking system calls to handle all connections in one thread.
> (Like twisted or lib-event or just select)

This describes how multiple connections can be handled (for example Apache 
has several different approaches for this), which is a different issue from 
spreading functionality over different processes (like Postfix does).

Where do you expect the bottleneck to be in your application? Does it have 
to serve a large number of clients? Does it have to send or receive large 
amounts of data? Does it have to do heavy computations? Does it have to get 
data to or from external servers like a DB server or a web service?

> After considering for a while,  I thought there are some faults in the
> multiple parts model:
>
> 1. Writing code to handle message is much more tedious than just doing
> function calls
> 2. It's not very easy to make a pipe line works fine.

It depends a lot on what you are trying to build and where and how you make 
the splits between differents parts of your application.

For example a function call is simple in the case there is only one thread 
and the called operation will not block. If it does block, you need to add a 
callback mechanism like Twisted's Deferred. Or if there are multiple 
threads, you have to be very careful about which functions you are allowed 
to call while your thread is holding one or more locks. So a function call 
is not so simple anymore when it's part of a complex application...

There are libraries that make communication over a pipe more friendly, such 
as Perspective Broker or Foolscap. This does not mean you can replace any 
arbitrary function call by a remote method call, but it does mean you can 
skip writing (de)serialization code again and again.

No matter which model you choose, dividing your application into 
communicating blocks is a good idea: it is absolutely necessary when using 
multiple processes, if avoids a lot of locking issues when using threads and 
it makes your application easier to test in all three models.

Bye,
		Maarten