[Twisted-Python] XML parsing on twisted
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Fri Oct 2 08:40:32 MDT 2009
On 1 Oct, 05:53 pm, burslem2001 at yahoo.com wrote:
>Hello,
>
>Probably a pretty standard question. However what are recommended
>mechanics of parsing XML on twisted? I have a humongous string that
>needs to be parsed and pushed into a database in the right columns.
Depending on how big the strings are, you may just want to parse them in
the obvious way and then deal with the results. If the strings are
really epically big, then you have a few options.
You can handle them in another thread in the usual way.
twisted.internet.threads.deferToThread gives you easy access to a
threadpool which you can use for tasks like this.
You can hand them off to another process and deal with them there.
Twisted has child process control built in, via reactor.spawnProcess.
You may also find the Ampoule library (not part of Twisted) handy for
this.
You can also do the XML parsing incrementally. The Python standard
library includes a SAX parser which you might want to use for this. I
think the newer APIs (eg etree) also support some forms of incremental
parsing. This should let you spread out the task of handling the XML
over a longer period of time, thus avoiding blocking the reactor thread
for unreasonable amounts of time.
Jean-Paul
More information about the Twisted-Python
mailing list