[Twisted-Python] Compressing PB communication
glyph at divmod.com
glyph at divmod.com
Fri Jun 30 10:08:03 MDT 2006
On Fri, 30 Jun 2006 12:24:43 +0300, Tzahi Fadida <tzahi.ml at gmail.com> wrote:
>On Friday 30 June 2006 03:44, glyph at divmod.com wrote:
>> On Thu, 29 Jun 2006 22:01:22 +0300, Tzahi Fadida <tzahi.ml at gmail.com> wrote:
>> Not really. PB is optimized (heavily optimized) for lots of small
>> control-channel messages. Serializing 20MB of data with Jelly would
>> probably stop your process for a good 30 seconds, and due to the way that
>> the original Jelly was designed, it cannot be processed incrementally; you
>> will end up allocating something like 100MB of memory just to get the data
>> serialized and the packet sent, if you raise the limit.
>
>30 seconds of doing what? serializing?
Yeah. This is just an estimate, of course, but passing 20 megabytes of structured data through jelly is going to be really slow. It definitely should be faster, but nobody really has the inclination; people with harder performance requirements tend to just go use other protocols rather than improve PB, as Bruce mentioned with STOMP.
>Let me try something else, what if i want to replace jelly with pickle for
>a special Server and Client factory for sending messages from the Server to
>the client only. If the client sends a message to the server, they must
>be jellied. The idea is that you are consciously saying that the client
>completely trust the server.
>I think this is a good idea for some security model that also want performance
>and resource savings.
I don't think this is particularly a good idea, but then, it's an idea that has less and less to do with PB all the time. You're talking about implementing a new protocol that has about two dozen features that PB does not have: support for pluggable serialization mechanisms, message tagging, on the fly compression, chunked encoding of large messages. Some of these features would require that you change the way ordering guarantees work in PB and the way concurrency interacts with its API. Maybe you could use some small portion of PB to build this monster of a protocol, but when you're done, you would not be using PB.
I can't see why you need the protocol machinery to do all of this for you. If I were building this application I'd certainly just use streaming producers and send PB messages (over an existing, unmodified PB) that were of a reasonable size until all the data had been transferred.
Also, designing a protocol where you "completely trust" the other end of the wire, even if it is the "server", is a bad idea. You should only trust the other end of the wire if every message that it sends is encrypted and signed with a verified certificate, and even that is a stretch. Using pickle at the protocol level means that during the verification process, you are vulnerable to attacks that use the pickled signature exchange to send you an exploit.
>I don't understand, you are saying that twisted does not send portions, i.e.
>it is blocking? that doesn't sound right to me, even if i send 20mb of data
>it should be portioned and let other transfer also get the chance.
>I was under the impression that twisted is multiplexing even on the channel
>level. I can, of course, always open 2 channels but this is evil.
If you send 20mb of data in a single write() call, that means you have at least 20mb of data sitting in the outbound write buffer until it can all be written, not to mention that the jelly serialization process is going to copy all of your data at least twice.
More information about the Twisted-Python
mailing list