[Twisted-Python] Sending unicode strings

Bob Ippolito bob at redivi.com
Sat Apr 23 00:03:09 MDT 2005


On Apr 23, 2005, at 1:34 AM, Michal Chruszcz wrote:

> I was undoubtedly surprised when I found out that I cannot pass unicode
> strings to Twisted.
> In 1.3 twisted.internet.abstract looks like this:
>     def write(self, data):
>         assert isinstance(data, str), "Data must be a string."
>         if not self.connected:
>             return
>
> and in 2.0:
>     def write(self, data):
>         if isinstance(data, unicode): # no, really, I mean it
>             raise TypeError("Data must be not be unicode")
>
> Why do you mean it? Why I can't send unicode through twisted? It's
> ridiculous that I have to convert UTF8 strings to ISO on the client 
> side
> and then once again from ISO to UTF8 on the server side, so I suppose
> you've got really good excuse.

You must ALWAYS encode or decode unicode at I/O boundaries in any 
programming language/framework.  Unicode has no encoding.  You must 
choose one at the I/O boundary, but that choice is up to you.  I 
suggest you read up on the hows and whys of Unicode, because apparently 
you missed something.

Specifically, Twisted's transports are for writing BYTES (not text).  
Unicode is strictly a bunch of characters that have no inherent byte 
representation.  The unicode type has nothing at all to do with UTF-8, 
I'm not sure why you decided they were related.  Technically the 
unicode type is represented internally with either UCS-2 or UCS-4 
depending on Python's configuration options.

The same is true for file objects in Python.  Though writing to them 
will automatically coerce to/from some default encoding 
(sys.getdefaultencoding()), usually ASCII, which hurts more than helps. 
  Twisted takes the high road and explicitly provides no automagic 
conversion for unicode objects.  If it did, your program would probably 
crash at random places if users of your application typed in non-ascii 
characters, because you didn't think enough about unicode before 
deciding to use it.  Now you are required to have a modicum of 
understanding about what you're doing when you use unicode, so it is 
far less likely that you will write code that has such silly bugs.

In more sane environments than Python, you will NOT have a single type 
that can represent both data and text at the same time (Python's str is 
evil).  Additionally, it is often the case that text types in more sane 
environments don't have a single internal representation (so you don't 
have to pay the N-byte penalty, or encoding/decoding costs at I/O 
boundaries for text you never really manipulate, etc.).

-bob





More information about the Twisted-Python mailing list