[Twisted-Python] Sending unicode strings
Bob Ippolito
bob at redivi.com
Sat Apr 23 00:03:09 MDT 2005
On Apr 23, 2005, at 1:34 AM, Michal Chruszcz wrote:
> I was undoubtedly surprised when I found out that I cannot pass unicode
> strings to Twisted.
> In 1.3 twisted.internet.abstract looks like this:
> def write(self, data):
> assert isinstance(data, str), "Data must be a string."
> if not self.connected:
> return
>
> and in 2.0:
> def write(self, data):
> if isinstance(data, unicode): # no, really, I mean it
> raise TypeError("Data must be not be unicode")
>
> Why do you mean it? Why I can't send unicode through twisted? It's
> ridiculous that I have to convert UTF8 strings to ISO on the client
> side
> and then once again from ISO to UTF8 on the server side, so I suppose
> you've got really good excuse.
You must ALWAYS encode or decode unicode at I/O boundaries in any
programming language/framework. Unicode has no encoding. You must
choose one at the I/O boundary, but that choice is up to you. I
suggest you read up on the hows and whys of Unicode, because apparently
you missed something.
Specifically, Twisted's transports are for writing BYTES (not text).
Unicode is strictly a bunch of characters that have no inherent byte
representation. The unicode type has nothing at all to do with UTF-8,
I'm not sure why you decided they were related. Technically the
unicode type is represented internally with either UCS-2 or UCS-4
depending on Python's configuration options.
The same is true for file objects in Python. Though writing to them
will automatically coerce to/from some default encoding
(sys.getdefaultencoding()), usually ASCII, which hurts more than helps.
Twisted takes the high road and explicitly provides no automagic
conversion for unicode objects. If it did, your program would probably
crash at random places if users of your application typed in non-ascii
characters, because you didn't think enough about unicode before
deciding to use it. Now you are required to have a modicum of
understanding about what you're doing when you use unicode, so it is
far less likely that you will write code that has such silly bugs.
In more sane environments than Python, you will NOT have a single type
that can represent both data and text at the same time (Python's str is
evil). Additionally, it is often the case that text types in more sane
environments don't have a single internal representation (so you don't
have to pay the N-byte penalty, or encoding/decoding costs at I/O
boundaries for text you never really manipulate, etc.).
-bob
More information about the Twisted-Python
mailing list