[Twisted-Python] epoll reactor problems

Alec Matusis matusis at matusis.com
Wed Apr 11 03:03:06 MDT 2007


> We would probably need more information. What's your version of python?

We are using Python 2.4.1

> Can you
> provide a reproductible example? 

It's hard to provide a reproducible example: we observe this problem only on
the live servers. So I do not know how to simplify the code (it has 40000+
lines) such that the problem still remains, since we would have to try it on
real users...

Did you try to do a strace on your
> running
> server to see what's going on?

I did run strace- it made the server unresponsive, so it had to be
restarted. Here is the output from the problematic server at 99% CPU:
alecm at web10 ~> strace -p 5315
Process 5315 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=1023, u64=12304606485815493631}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 30) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=423, u64=12304606485815493031}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 29) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=1023, u64=12304606485815493631}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 28) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=201, u64=12304606485815492809}}, {0, {u32=32,
u64=206158430240}}, {EPOLLWRNORM|EPOLLONESHOT|EPOLLET|0x3fffa820,
{u32=32767, u64=23749657318424575}}, {0, {u32=5315, u64=5315}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0, u64=0}},
{0, {u32=4294945068, u64=140737488333100}}}, 1434, 27) = 6
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
recvfrom(201, "getmore:20\r\n\0", 65536, 0, NULL, NULL) = 13
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0xb9d030, FUTEX_WAKE, 1)          = 0
futex(0x8a3350, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=1023, u64=12304606485815493631}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 26) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=423, u64=12304606485815493031}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 25) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=1023, u64=12304606485815493631}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 24) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=423, u64=12304606485815493031}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 23) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=1023, u64=12304606485815493631}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 22) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=423, u64=12304606485815493031}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 22) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=1023, u64=12304606485815493631}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 21) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=423, u64=12304606485815493031}},
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRBAND|E
POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
{EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
u64=22827751178240}}, {0, {u32=0, u64=0}},
{EPOLLOUT|EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
u64=18097643565645823}}}, 1434, 20) = 5
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0
futex(0x7ce7f0, FUTEX_WAKE, 1)          = 0

 Interestingly, strace on another type of server, that runs only at 9% CPU
does not crash it. Here is that strace for comparison: 

alecm at web10 ~> strace -p 4131
Process 4131 attached - interrupt to quit
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=1740, u64=12304606485815494348}}}, 1728, 26) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
recvfrom(1740, "\r\n", 65536, 0, NULL, NULL) = 2
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1740, {EPOLLIN|EPOLLOUT, {u32=1740,
u64=12304606485815494348}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLOUT, {u32=1740, u64=12304606485815494348}}}, 1728, 1) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
sendto(1740, "\r\n\0", 3, 0, NULL, 0)   = 3
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1740, {EPOLLIN, {u32=1740,
u64=12304606485815494348}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {}, 1728, 0)              = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {}, 1728, 0)              = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {}, 1728, 0)              = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=1734, u64=12304606485815494342}}}, 1728, 92) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
recvfrom(1734, "\r\n", 65536, 0, NULL, NULL) = 2
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1734, {EPOLLIN|EPOLLOUT, {u32=1734,
u64=12304606485815494342}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLOUT, {u32=1734, u64=12304606485815494342}}}, 1728, 69)
= 1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
sendto(1734, "\r\n\0", 3, 0, NULL, 0)   = 3
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1734, {EPOLLIN, {u32=1734,
u64=12304606485815494342}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=1680, u64=12304606485815494288}}}, 1728, 68) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
recvfrom(1680, "\r\n", 65536, 0, NULL, NULL) = 2
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1680, {EPOLLIN|EPOLLOUT, {u32=1680,
u64=12304606485815494288}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLOUT, {u32=1680, u64=12304606485815494288}}}, 1728, 30)
= 1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
sendto(1680, "\r\n\0", 3, 0, NULL, 0)   = 3
epoll_ctl(4, EPOLL_CTL_MOD, 1680, {EPOLLIN, {u32=1680,
u64=12304606485815494288}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=1748, u64=12304606485815494356}}}, 1728, 29) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
recvfrom(1748, "\r\n", 65536, 0, NULL, NULL) = 2
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1748, {EPOLLIN|EPOLLOUT, {u32=1748,
u64=12304606485815494356}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLOUT, {u32=1748, u64=12304606485815494356}}}, 1728, 25)
= 1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
sendto(1748, "\r\n\0", 3, 0, NULL, 0)   = 3
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1748, {EPOLLIN, {u32=1748,
u64=12304606485815494356}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=1573, u64=12304606485815494181}}}, 1728, 24) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
recvfrom(1573, "\r\n", 65536, 0, NULL, NULL) = 2
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1573, {EPOLLIN|EPOLLOUT, {u32=1573,
u64=12304606485815494181}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLOUT, {u32=1573, u64=12304606485815494181}}}, 1728, 8) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
sendto(1573, "\r\n\0", 3, 0, NULL, 0)   = 3
epoll_ctl(4, EPOLL_CTL_MOD, 1573, {EPOLLIN, {u32=1573,
u64=12304606485815494181}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLIN, {u32=1681, u64=12304606485815494289}}}, 1728, 7) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
recvfrom(1681, "\r\n", 65536, 0, NULL, NULL) = 2
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_ctl(4, EPOLL_CTL_MOD, 1681, {EPOLLIN|EPOLLOUT, {u32=1681,
u64=12304606485815494289}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {{EPOLLOUT, {u32=1681, u64=12304606485815494289}}}, 1728, 2) =
1
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
sendto(1681, "\r\n\0", 3, 0, NULL, 0)   = 3
epoll_ctl(4, EPOLL_CTL_MOD, 1681, {EPOLLIN, {u32=1681,
u64=12304606485815494289}}) = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
epoll_wait(4, {}, 1728, 1)              = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0
futex(0x859a10, FUTEX_WAKE, 1)          = 0

> As your load was already 99 before using epoll, can it be an
> application
> problem?

Our load with poll reaches 99% only during the daytime. We did all this at
night, when the load is only 30-40% with poll. Indeed, after we reverted
back to poll, the load went back to 35% and will remain there until the
morning.

It looked to me like a runaway process: as soon as the load went over ~25%
on the problematic server, it continued to rise till it reached 99.9% within
less than 1 min.


> -----Original Message-----
> From: twisted-python-bounces at twistedmatrix.com [mailto:twisted-python-
> bounces at twistedmatrix.com] On Behalf Of Thomas Hervé
> Sent: Wednesday, April 11, 2007 1:25 AM
> To: twisted-python at twistedmatrix.com
> Subject: Re: [Twisted-Python] epoll reactor problems
> 
> Quoting Alec Matusis <matusis at matusis.com>:
> 
> > We just switched 2 types of production servers to epoll reactor(
> Twisted
> > 2.5) from poll reactor (Twisted 2.2).
> >
> > The CPU%% utilization of the first type of server that does not do
> much
> > except occasionally pushing messages to about 5000 clients dropped
> from
> > about 40% to 8%, which is very good.
> 
> That's great.
> 
> > The second type of server is more complicated. The CPU utilization of
> that
> > server (as measured by top) went down from 40% to about 15% after
> switching
> > to epoll.
> >
> > Here is the problem: after about 10min of running that server with
> CPU%%
> > staying at about 15%, the CPU suddenly jumps to 99.9% for that
> process and
> > just stays there. We reproduced this several times. The server
> remains
> > responsive, even when top shows 99.9% CPU. (Which is very different
> from
> > 99.9% CPU from real load when we used poll- we are intimately
> familiar with
> > the performance in that regime unfortunately.)
> >
> > The kernel is 2.6.11.4-21.12-smp
> >
> > Can anybody help with this 99.9% CPU epoll problem?
> 
> We would probably need more information. What's your version of python?
> Can you
> provide a reproductible example? Did you try to do a strace on your
> running
> server to see what's going on?
> 
> As your load was already 99 before using epoll, can it be an
> application
> problem?
> 
> --
> Thomas
> 
> 
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python





More information about the Twisted-Python mailing list