[Twisted-Python] Sending unicode strings
Bob Ippolito
bob at redivi.com
Mon Apr 25 10:58:17 MDT 2005
On Apr 25, 2005, at 12:01 PM, Ken Kinder wrote:
> Tommi Virtanen wrote:
>
>> Personally, I think ass-u-ming Unicode is encoded as UTF-8 would have
>> been sane, but I can understand that not everyone agrees; e.g. Java
>> wants UCS-16 if I remember correctly. And not serializing to UTF-8
>> by default catches errors that would otherwise cause mysterious things
>> to happen.
>>
> Most of the time, you should know the encoding. Instead of forcing the
> protocol to do the work, why not just have a way of setting the
> expected encoding for write() and similar methods? If the encoding is
> not set (ie, None), then raise the exception. Otherwise, use the
> specified encoding. This would have the added readability advantage in
> that unicode encoding -- uhh code -- wouldn't have to be sprinkled
> throughout the protocol classes -- only in places where the encoding
> is actually set -- in HTTP's headers for example.
import codecs
class MyProtocol(....):
def __init__(self, encoding='ascii'):
self.textwriter = codecs.getwriter(encoding)(self.transport)
def write_text(self, s):
self.textwriter.write(s)
def write(self, s):
self.transport.write(s)
This way write_text will verify that you are only sending valid strings
in the chosen encoding. If you call write_text() with a str then it
will be decoded using sys.getdefaultencoding() and then encoded using
the chosen encoding, so it really does guarantee that all strings sent
with write_text are valid (at this level).
You should really keep separate what you're doing with raw bytes
(write) and what you're doing with text (write_text) as they are
different beasts.
There is no need to sprinkle this everywhere, just make it a mix-in or
whatever and use as appropriate.
-bob
More information about the Twisted-Python
mailing list