[Twisted-Python] suggestion for tracking down cause of 2 bugs sought: possibly threads / defer.DeferredSemaphore ?

Jonathan Vanasco twisted-python at 2xlp.com
Wed Mar 21 09:32:53 MDT 2007


I've got 2 bugs in a twisted app that I'm trying to narrow in on.

Overview of the app:
	A twisted service connects to a db and requests about 500 web pages  
to download and analyze.
	The service than uses defer.DeferredSemaphore service to handle  
dispatching each request to a thread via deferToThread .
		defer.DeferredSemaphore( tokens= MAX_DOWNLOADERS )
		reactor.suggestThreadPoolSize( MAX_DOWNLOADERS + 1 )
	Each thread runs a python script that is not written in Twisted and  
has a lot of blocking.

And to pre-empt many comments that will inevitably pop up: It would  
be great to convert all the code to twisted , but 90% of the code +  
underlying modules are used by command line maintenance routines that  
were already in place.  I just don't have the time or resources to  
rewrite it all in twisted.  One day I'd love to have the funding to  
hire someone to manage them, but I just don't have that time myself.


Overview of the bugs:
	1)
		The query for figuring out which pages to download can take quite  
some time , so I run the getNextDownloads routine in a new thread ,  
so the pool always remains full.
		It works perfect, except when the number of items queued up to  
defer.DeferredSemaphore reaches ~ 1000.
		After ~1000, I seem to only be running with 1-2 semaphores in place  
not 15

		Question:
			Can anyone suggest a way to test this to see where the error is?
			Off the top of my head, the following seem plausible:
				a- hitting 1000 just breaks something in  
defer.DeferredSemaphore  .  the module is listed as unstable, so I'm  
guessing this could be the cause.
				b- the defer.DeferredSemaphore connection is just coincidental --  
its really an issue with threads locking up
				c- something entirely different
			Unfortunately, I'm at a loss on trying to figure out whats going on.

	2)
		I'm experiencing an obscene amount of process growth.  I'm fairly  
certain that its from constant errors in urllib2 , which has been  
nothing of a nightmare with countless known bugs and patches  
( downloading takes place via Mechanize , which is wrapped via  
urllib2 ), and makes me consider just wrapping wget or curl.   but  
this could very well be threading related.
		
		Question:
			Is there any way to profile memory usage by twisted in a granular  
fashion ?  I'd like to figure out where this memory usage is ( it  
grows steadily to ~1.7gb after 8 hours ).

Thanks,

// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -






More information about the Twisted-Python mailing list