[Twisted-Python] Encoding patch for twisted.web.microdom.Document
Cory Dodt
corydodt at yahoo.com
Mon Aug 18 10:14:20 MDT 2003
Francisco Miguel Colaço wrote:
> I have patched that class, in order to make the xml opening as:
>
>
> <?xml version="1.0" encoding="utf-8" ?>
>
>
>
This patch addresses the header without checking the actual character
encoding of the document, which basically turns inside-out the current
problem (character encoding can be utf-8 but header doesn't say so).
Microdom does support utf-8 output, by checking if the strings it
contains are UnicodeType. This leaks all over the place, and you would
get into trouble with your patch if you passed in encoding='utf-8' and
then provided microdom with an 8-bit string.
There's a few ways to make this less leaky:
- if encoding parameter is something unicodey, make sure self.data is
UnicodeType anywhere it is written to (i.e. anywhere it appears on the
left side of an assignment) by raising an exception if it's not
*and*
- make sure you can't change encoding to a non-unicode encoding or pass
a non-unicode encoding to any methods once the document has nodes
or
- internally convert everything to Unicode even if it is passed in as
8-bit string, and then use only the encoding parameter to determine how
writexml should work.
As it happens, there's already a bug open on this issue. Please sign up
on roundup and continue discussion of this issue here:
http://twistedmatrix.com/users/roundup.twistd/twisted/issue149
Thanks! I have already pasted your original email and my reply there.
Cory
More information about the Twisted-Python
mailing list