[Twisted-Python] Should I use asynchronous programming in my own modules?
Jürgen Strass
jrg718 at gmx.net
Thu Oct 18 06:41:38 MDT 2007
Hello,
I'm rather new to twisted and asynchronous programming in general.
Overall, I think I've understood the asynchronous programming model and
its implications quite well. Nevertheless, there are some remaining
questions.
To give some example, I'd like to develop my own simplified document
format in XML and a corresponding parser. The output of the parser (a
specialized document object model) will be traversed and translated into
HTML afterwards. This module could be useful outside any twisted
application, of course. Instead of generating HTML one could develop a
generator that produces LaTeX, for example. But it could also be used to
render HTML pages in a twisted web application. The question is this:
since parsing and generating large documents could block the reactor in
a twisted app, should I use any of twisted's asynchronous programming
features in this module (for better integration with twisted) or should
I rather develop it in a traditional way and run it in a thread?
The question came to my mind, because somewhere I read that long lasting
operations in third party modules should be called in a thread. This is
clear. I also read that if one has the opportunity to develop an
application from scratch, one should rather go for using twisted's
asynchronous programming features and divide long lasting operations
into small chunks. In principal, this approach is clear to me, but does
it also apply for modules which are entirely independent from twisted
networking code? And if so, is there any way to decouple them from the
twisted library for reuse in other applications?
The last question is what criteria I could use to divide long lasting
operations into chunks. In almost all books about asynchronous
programming I only read that if they're too big, they could block the
event loop. Of course, but how big is too big? And what's the measure
for it? Milliseconds, number of operations, number of code lines - or
what? Doesn't it depend entirely on the application at hand and how
reactive it has to be? Moreover, depending on the hardware used, on a
Pentium II less chunks can be processed at the same time than on a
Athlon 64, for example. And couldn't chunks also be too small, spending
more time than necessary in putting them into the reactor's queue, then
maybe sorting them and then calling them? In case the overhead involved
in scheduling some chunk is bigger than the processing time of the chunk
itself, the chunks are too small, aren't they?
Thanks in advance for any answers,
Jürgen
More information about the Twisted-Python
mailing list