[Twisted-web] [Athena] Is ReliableMessageDelivery really necessary?
glyph at divmod.com
glyph at divmod.com
Wed Jul 1 18:04:19 EDT 2009
On 10:15 am, spongelavapaul at googlemail.com wrote:
>I've hit a problem as my app has got bigger (about 30-40 widgets now,
>all chattering roughly once every 2 seconds) where the reliable
>message delivery mechanism is spiralling out of control. It seems that
>the constant back and forth means that large 'baskets' of messages are
>resent. The more this happens, the busier everything gets until the
>browser becomes unresponsive.
This is unfortunate, but I'm sure it's fixable. At least, partially.
Client-server communication, especially in JavaScript, isn't free.
>There's a fix for it: [Divmod-dev] athena duplicate messages issue but
>I'm slightly concerned about the potential for lost messages - and
>also confused about how this could happen. Given that HTTP is a
>reliable connection-oriented transport, where is the gap that messages
>can fall through?
HTTP is neither reliable nor connection-oriented :). TCP is reliable
and connection-oriented, but HTTP builds on top of it to produce
something which is neither.
"reliable" in this case doesn't mean that the transport is perfect and
will deliver everything, but that if you send messages "1, 2, 3", you
will get messages "1, 2, 3" in that order or you will get nothing at
all. (Of course you may also get just "1", or "1, 2", but you will
never get "3, 1, 2".)
Even if HTTP had a way to initiate the delivery of a message over a
channel that was already busy receiving the response to another message
(it doesn't) we'd have to contend with the browser APIs for issuing HTTP
requests, which leave out significant portions of the actual protocol.
For example, browser javascript may never issue more than two concurrent
requests to the same host, since the spec says that's all that you can
do.
So, what is happening here is that have Nevow attempts to implement a
protocol in terms of HTTP messages as individual, unreliable messages,
which may be eaten by beasts like transparent proxies and browser
runtime bugs, and present to your application a stream of messages which
are always in order and never dropped. This is, as it happens,
*exactly* what Orbited does, and Nevow could potentially be implemented
on top of Orbited. However, Nevow's implementation has a bug, and over-
zealously re-delivers messages, when frequently re-delivery is not
required. This is rarely a problem except for the noise that it
generates in your log files and the performance problems that it
creates, which you've noticed, if your message queue starts to back up.
So, my suggestion to you would be to read through the relevant
JavaScript code for delivering "baskets" to the server, and try to
figure out what exactly is happening, and write a patch to correct this
behavior. It's not trivial, but it's not rocket science either. If I
recall correctly, the problem is that the client will overzealously
interrupt its own connection to the server where it is sending a basket
of collected messages, in order to free up the HTTP connection to send a
*new* message which it has generated. It would be better if the client
would allow for a brief (and actually "brief" probably needs to be
pretty long, in the wild) grace period to allow the HTTP request to be
fully received and responded to before piling on more work.
Part of the problem here, of course, is that the crappy JavaScript
browser HTTP API won't let us tell how much of our request has been
uploaded or process the response as it arrives. So we have to guess
what a reasonable timeout would be, rather than have the algorithm
operate on actual data.
In other words, you're right: the messages are not actually disappearing
into a black hole :).
As far as what you should do: I think you should try to write a patch.
It's not trivial, but it's not rocket science either: it's just computer
science. Hopefully my description of the problem is accurate enough to
get you started; I'm sure that if you ask for help on this list or on
IRC as you're working on it, you will find no shortage of it. Lots of
people have reported this problem over the years but nobody has (as far
as I can tell from searching right now) thought to even report the bug
as a ticket on divmod.org, let alone contribute a fix for it.
>I think I can cope with lost messages in most cases, so would it be
>useful to add a kind of 'sendRemote' that was like 'callRemote' but
>didn't care about a response? Or maybe this already exists and I've
>missed it?
Could you cope with these messages arriving arbitrarily out of order? I
am willing to bet not; it would just make your application extremely
difficult to test, and it would start spewing exceptions when it started
to get more heavily loaded, rather than making the browser unresponsive.
>P.S. this app is likely to get more noisy - is it likely that I'll
>have to abandon Athena for Orbited or similar? I mean, are there
>architectural differences that will prevent Athena scaling?
More information about the Twisted-web
mailing list