[Twisted-Python] help with ssl timeout and not reconnecting client factory

Thu Mar 17 07:28:06 MST 2005

On Thu, 17 Mar 2005 13:40:56 +0100, Andrea Arcangeli <andrea at cpushare.com> wrote:
>>From a client I'm getting this error:
> 
> [snip -  traceback and log]
>         
> This is a reconnecting client factory, the python version is 2.3.4 and twisted
> version is 1.3.0a. The socket should sit in a idle state. No communication over
> that socket will happen (the app is under development), but it should not go in
> timeout unless the connection with the server ends and the keepalive events
> triggers a disconnect (I enabled keepalive on the tcp level).
> 
> Even if it goes in timeout it must try to reconnect immediatly while it seems
> like it's hanging after the "Stopping factory".
> 
> 
> Earlier when I got the connection timed out event (for no apparent good reason)
> at least it was immediatly trying to reconnect:
> 
> [snip - traceback and log]
>         
> 2005/03/14 17:59 CET [cpushare_protocol,client] <twisted.internet.ssl.Connector instance at 0x2aaaac28d950> will retry in 2 s
> econds
> 2005/03/14 17:59 CET [cpushare_protocol,client] Stopping factory <cpushare.proto.cpushare_factory instance at 0x2aaaad414290>
> 2005/03/14 17:59 CET [-] Starting factory <cpushare.proto.cpushare_factory instance at 0x2aaaad414290>
> 
> 
> So my first priority is to understand why it stopped trying to reconnect (which
> is the major bug) and the second priority is to understand why it was going in
> timeout in the first place. (I can't exclude there have been a temporary network
> disruption that caused the keepalive to trigger the disconnect.)

  For some reason unfathomable to me, ReconnectingClientFactory _stops_ trying to reconnect if a UserError is the cause of failed connection.  Further, for some reason, error.TimeoutError subclasses UserError.  This has bitten at least one other project (buildbot).

> 
> Could this be a bug in 1.3.0a? I expect the client will be mostly run with
> 1.3.0a, only on the server side I use SVN + pending fixes.

  I'm inclined to say that it is indeed a bug.  I think ReconnectingClientFactory should always retry the connection, regardless of the exception with which the previous attempt fails.  If a program wants to allow a user to interrupt the retry logic, there is a "stopTrying" method.

> 
> This is the reconnecting code:
> 
> class cpushare_factory(ReconnectingClientFactory):
> 	maxDelay = 600 # limit the maximum delay to 10 min
> 
> 	protocol = cpushare_protocol
> 
> 	def buildProtocol(self, addr):
> 		self.resetDelay()
> 		protocol = self.protocol()
> 		assert not hasattr(protocol, 'factory')
> 		protocol.factory = self
> 		return protocol
> 
> 	def clientConnectionFailed(self, connector, reason):
> 		print 'Connection failed. Reason:', reason
> 		ReconnectingClientFactory.clientConnectionFailed(self, connector, reason)

  If you look at twisted/internet/protocol.py for the definition of ReconnectingClientFactory.clientConnectionFailed, it should be pretty obvious how you want to redefine clientConnectionFailed to avoid the behavior you're seeing.

> 
> 	def connectionMade(self):
> 		self.transport.setTcpKeepAlive(1)
> 
> Is the above correct? It works fine when the connection failed reason is
> "ConnectionRefusedError" instead of TimeoutError.
> 
> 
> What else should I do to prevent this error to leave the factory stopped?
> 
> 2005/03/17 06:31 CET [-] Connection failed. Reason: [Failure instance: Traceback: twisted.internet.error.TimeoutError, User t
> imeout caused connection failure.
> 
> Where does the "twisted.internet.error.TimeoutError" come from?
> 

  It's generated internally by Twisted when the alloted connection time has elapsed without a connection being created.

  Most likely it _is_ network related problems that caused the connection to fail, but Twisted is certainly responsible for the decision to cease further reconnection attempts.

  Jp