[Twisted-Python] suggestion for tracking down cause of 2 bugs sought: possibly threads / defer.DeferredSemaphore ?
Jonathan Vanasco
twisted-python at 2xlp.com
Wed Mar 21 09:32:53 MDT 2007
I've got 2 bugs in a twisted app that I'm trying to narrow in on.
Overview of the app:
A twisted service connects to a db and requests about 500 web pages
to download and analyze.
The service than uses defer.DeferredSemaphore service to handle
dispatching each request to a thread via deferToThread .
defer.DeferredSemaphore( tokens= MAX_DOWNLOADERS )
reactor.suggestThreadPoolSize( MAX_DOWNLOADERS + 1 )
Each thread runs a python script that is not written in Twisted and
has a lot of blocking.
And to pre-empt many comments that will inevitably pop up: It would
be great to convert all the code to twisted , but 90% of the code +
underlying modules are used by command line maintenance routines that
were already in place. I just don't have the time or resources to
rewrite it all in twisted. One day I'd love to have the funding to
hire someone to manage them, but I just don't have that time myself.
Overview of the bugs:
1)
The query for figuring out which pages to download can take quite
some time , so I run the getNextDownloads routine in a new thread ,
so the pool always remains full.
It works perfect, except when the number of items queued up to
defer.DeferredSemaphore reaches ~ 1000.
After ~1000, I seem to only be running with 1-2 semaphores in place
not 15
Question:
Can anyone suggest a way to test this to see where the error is?
Off the top of my head, the following seem plausible:
a- hitting 1000 just breaks something in
defer.DeferredSemaphore . the module is listed as unstable, so I'm
guessing this could be the cause.
b- the defer.DeferredSemaphore connection is just coincidental --
its really an issue with threads locking up
c- something entirely different
Unfortunately, I'm at a loss on trying to figure out whats going on.
2)
I'm experiencing an obscene amount of process growth. I'm fairly
certain that its from constant errors in urllib2 , which has been
nothing of a nightmare with countless known bugs and patches
( downloading takes place via Mechanize , which is wrapped via
urllib2 ), and makes me consider just wrapping wget or curl. but
this could very well be threading related.
Question:
Is there any way to profile memory usage by twisted in a granular
fashion ? I'd like to figure out where this memory usage is ( it
grows steadily to ~1.7gb after 8 hours ).
Thanks,
// Jonathan Vanasco
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
More information about the Twisted-Python
mailing list