[Twisted-Python] raw xml to element, char encoding/decoding error : OT
Gabriel Rossetti
gabriel.rossetti at arimaz.com
Wed Feb 18 06:41:05 MST 2009
Gabriel Rossetti wrote:
> Hello,
>
> I wrote some code to transform a raw XML string into a domish.Element,
> and I keep on getting char encoding/decoding errors :
>
> class __RawXmlToElement(object):
> def __call__(self, s):
> self.result = None
> def onStart(el):
> self.result = el
> def onEnd():
> pass
> def onElement(el):
> self.result.addChild(el)
> parser = domish.elementStream()
> parser.DocumentStartEvent = onStart
> parser.ElementEvent = onElement
> parser.DocumentEndEvent = onEnd
> tmp = domish.Element(("", "s"))
> tmp.addRawXml(s)
> parser.parse(tmp.toXml())
> return self.result.firstChildElement()
>
> rawXmlToElement = __RawXmlToElement()
>
>
> Here's a test raw XML string :
>
> >>> u"<t>reçu</t>"
> u'<t>re\xe7u</t>'
>
> >>> u"<t>reçu</t>".encode("utf-8")
> '<t>re\xc3\xa7u</t>'
>
> >>> "<t>reçu</t>"
> '<t>re\xc3\xa7u</t>'
>
>
> As you can see my system encodes strings in UTF-8, I tried the
> following but I
> keep on getting errors :
>
> >>> rawXmlToElement("<t>reçu</t>")
> raw xml adder error : 'ascii' codec can't decode byte 0xc3 in
> position 5: ordinal not in range(128)
>
> >>> rawXmlToElement(u"<t>reçu</t>")
> parser error : 'ascii' codec can't encode character u'\xe7' in
> position 8: ordinal not in range(128)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 26, in __call__
> AttributeError: 'NoneType' object has no attribute 'firstChildElement'
>
> >>> rawXmlToElement(unicode("<t>reçu</t>", "utf-8"))
> parser error : 'ascii' codec can't encode character u'\xe7' in
> position 8: ordinal not in range(128)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 26, in __call__
> AttributeError: 'NoneType' object has no attribute 'firstChildElement'
>
>
> If I try it with ASCII encodable chars it works correctly :
>
> >>> rawXmlToElement("<t>toto</t>").toXml()
> u'<t>toto</t>'
>
> >>> rawXmlToElement(u"<t>toto</t>").toXml()
> u'<t>toto</t>'
>
> >>> rawXmlToElement(unicode("<t>toto</t>", " utf-8")).toXml()
> u'<t>toto</t>'
>
>
> Does anyone have an idea on what I'm doing wrong here? Thank you!
>
I think this is an Python environment problem and not a Twisted problem.
If I run the attached example in Eclipse, it works, if I run it from a
terminal, it doesn't. This is now off topic, but if anyone has an Idea
I'd be grateful... I'm also going to post this on the Python mailing list.
Thank you,
Gabriel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xml_parser_test.py
Type: text/x-python
Size: 812 bytes
Desc: not available
URL: </pipermail/twisted-python/attachments/20090218/b55f0a66/attachment-0002.py>
More information about the Twisted-Python
mailing list