[Twisted-Python] strange server crash
    Alec Matusis 
    matusis at yahoo.com
       
    Sat Mar 21 21:02:03 MDT 2009
    
    
  
I have been operating a busy twisted TCP server for quite some time (1
year+).
Yesterday, it strangely crashed under peak load for that day (it has been
under more load in the past).  
 
# grep segfault /var/log/messages
Mar 20 19:12:15 serv2 kernel: [17687209.144548] twistd[10701]: segfault at 8
rip 41eccf rsp 7fff33675270 error 6
 
I cannot understand the reason of the crash, except that its some very rare
race condition.
While the code is quite cumbersome, here are some general details:
 
8 min before the crash, the server started giving repeated error on clients
disconnect, that I have never seen before in the last year.
 
It first produced this twice:
 
2009/03/20 19:04 -0700 [ClientProtocol,318930,70.132.227.150] Unhandled
Error
        Traceback (most recent call last):
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/application/a
pp.py", line 113, in runReactorWithLogging
            reactor.run()
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 220, in run
            self.mainLoop()
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 231, in mainLoop
            self.doIteration(t)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 181, in doPoll
            log.callWithLogger(selectable, _drdw, selectable, fd, event)
        --- <exception caught here> ---
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 48, in callWithLogger
            return callWithContext({"system": lp}, func, *args, **kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 33, in callWithContext
            return context.call({ILogContext: newCtx}, func, *args, **kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args,
**kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 37, in callWithContext
            return func(*args,**kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 212, in _doReadOrWrite
            self._disconnectSelectable(selectable, why, inRead)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 252, in _disconnectSelectable
            selectable.readConnectionLost(f)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 405, in readConnectionLost
            self.connectionLost(reason)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 416, in connectionLost
            protocol.connectionLost(reason)
          File "/nail/live/im/myserv/protocols.py", line 170, in
connectionLost
            self.session.logout()
          File "/nail/live/im/myserv/service.py", line 1199, in logout
            if self.nick in session.sellers:
        exceptions.AttributeError: Deferred instance has no attribute
'sellers'
 
and then a different error in the same line with the same call stack, that
was repeated about 1000 times for different logging out clients:
 
2009/03/20 19:04 -0700 [ClientProtocol,326158,99.245.246.187] Unhandled
Error
        Traceback (most recent call last):
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/application/a
pp.py", line 113, in runReactorWithLogging
            reactor.run()
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 220, in run
            self.mainLoop()
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 231, in mainLoop
            self.doIteration(t)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 181, in doPoll
            log.callWithLogger(selectable, _drdw, selectable, fd, event)
        --- <exception caught here> ---
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 48, in callWithLogger
            return callWithContext({"system": lp}, func, *args, **kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 33, in callWithContext
            return context.call({ILogContext: newCtx}, func, *args, **kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args,
**kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 37, in callWithContext
            return func(*args,**kw)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 212, in _doReadOrWrite
            self._disconnectSelectable(selectable, why, inRead)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 252, in _disconnectSelectable
            selectable.readConnectionLost(f)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 405, in readConnectionLost
            self.connectionLost(reason)
          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 416, in connectionLost
            protocol.connectionLost(reason)
          File "/nail/live/im/myserv/protocols.py", line 170, in
connectionLost
            self.session.logout()
          File "/nail/live/im/myserv/service.py", line 1199, in logout
            if self.nick in session.sellers:
        exceptions.AttributeError: WLResultSet instance has no attribute
'sellers'
 
This last error repeated about 1000 times for different clients, after which
the segfault occurred.
The problem is, that session is an instance of a Session class that is
defined in the code, it is NEITHER a Deferred NOR an instance  of
WLResultSet class- these are completely different classes, and theres no
possible way in the code for variable session to become an instance of
Deferred.
The code is approximately like this:
 
def logout(self):
self.on_logout(self.do_logout) 
for session in self.service.client_sessions: ß self.service.client_sessions
is a dictionary of custom Session class instances
if self.nick in session.sellers:  ß error here, it claims that session is a
Deferred or sometimes an instance of a completely unrelated class
                                                #Some statements 
 
The function self.on_logout sometimes (rarely) dispatches a deferred (it has
maybeDeferred in it), such that its argument self.on_logout is called when
that deferred returns (otherwise its called synchronously).
 
My question is, is it possible to muck up deferred in such a way, that after
dispatching a deferred chain inside  self.on_logout(self.do_logout) , the
remaining statements would be executed in some  unexpected local scope, such
that I would see these weird errors where my for loop iterator would
suddenly go over Deferred instances, instead of the dictionary of Session
instances that I define in the code ?? 
 
Also, why did these errors lead to segfault of the whole twisted process
after 8 minutes?
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20090321/825facf2/attachment.html>
    
    
More information about the Twisted-Python
mailing list