[Twisted-Python] intermittent problem: not accepting new connections

Jean-Paul Calderone exarkun at divmod.com
Wed Sep 10 16:44:15 EDT 2008


On Wed, 10 Sep 2008 13:32:06 -0700, Alec Matusis <matusis at yahoo.com> wrote:
>I have had a twisted epoll server that was heavily used, such that it
>saturated CPU (100% shown by "top", about 5000 connections, intense message
>relaying).
>I am using twisted 2.5.0 that I patched for epoll bug.
>It was run on python 2.4.4 , 2.6.11 kernel on a single core xeon 3.0 GHz
>CPU. This server has been on for many months, and it has been rock-stable.
>
>A couple of days ago I migrated that server to a newer machine: same patched
>twisted 2.5.0, same python 2.4.4, newer 2.6.24 kernel and a quad core xeon
>L5420 CPU.
>CPU usage dropped from 100% to 30%, as expected, with the same rate of
>client connections.
>
>However the server now has the following intermittent problem: about twice a
>day, it stops accepting new connections for a short period of 5-10 minutes.
>
>telnet times out, I get this:
>root at serv2:/proc/net/netfilter# telnet localhost 5229
>
>Trying 127.0.0.1...
>
>Existing connections are not cut, they server receives/delivers messages
>to/from them just fine.
>These short periods of not accepting connections do not correlate with
>increased CPU load or with the overall number of connections to the server.
>
>I have had a problem with the same symptoms before, when a server process
>run out of its quota of file descriptors.
>However, there were clear messages in the twisted log at that time, and
>upping the ulimits solved the problem.
>This time, there are no errors in ANY logs (twisted log. /var/log/messages,
>etc)

Do non-error log messages continue to appear in the Twisted log?  ie, is
it clear that the logging system is still working, or could it have failed
in some way, obscuring any exception reports?

>
>I am out of ideas on what this could be, because my setup is exactly the
>same as I have been using in the last year, except for a faster CPU and a
>newer kernel?
>
>I suspect that there are some new uncaught accept() exceptions in
>internet/tcp.py in the part where it's looking for EMFILE, ENOBUFS, ENFILE,
>ENOMEM, ECONNABORTED errors.
>

Any new unhandled errno values should definitely result in an exception
being logged (notice that the `raise´ which follows the checks for
various errno values is inside a try/except which logs any exception).

Jean-Paul




More information about the Twisted-Python mailing list