[Twisted-Python] Sending unicode strings

Mon Apr 25 10:58:17 MDT 2005

On Apr 25, 2005, at 12:01 PM, Ken Kinder wrote:

> Tommi Virtanen wrote:
>
>> Personally, I think ass-u-ming Unicode is encoded as UTF-8 would have
>> been sane, but I can understand that not everyone agrees; e.g. Java
>> wants UCS-16 if I remember correctly. And not serializing to UTF-8
>> by default catches errors that would otherwise cause mysterious things
>> to happen.
>>
> Most of the time, you should know the encoding. Instead of forcing the 
> protocol to do the work, why not just have a way of setting the 
> expected encoding for write() and similar methods? If the encoding is 
> not set (ie, None), then raise the exception. Otherwise, use the 
> specified encoding. This would have the added readability advantage in 
> that unicode encoding -- uhh code -- wouldn't have to be sprinkled 
> throughout the protocol classes -- only in places where the encoding 
> is actually set -- in HTTP's headers for example.

import codecs
class MyProtocol(....):
     def __init__(self, encoding='ascii'):
         self.textwriter = codecs.getwriter(encoding)(self.transport)

     def write_text(self, s):
         self.textwriter.write(s)

     def write(self, s):
         self.transport.write(s)

This way write_text will verify that you are only sending valid strings 
in the chosen encoding.  If you call write_text() with a str then it 
will be decoded using sys.getdefaultencoding() and then encoded using 
the chosen encoding, so it really does guarantee that all strings sent 
with write_text are valid (at this level).

You should really keep separate what you're doing with raw bytes 
(write) and what you're doing with text (write_text) as they are 
different beasts.

There is no need to sprinkle this everywhere, just make it a mix-in or 
whatever and use as appropriate.

-bob