[Twisted-Python] pushing out same message on 100k TCPs
Phil Mayers
p.mayers at imperial.ac.uk
Sat Feb 11 04:07:11 MST 2012
On 02/10/2012 08:20 PM, Tobias Oberstein wrote:
>
>> store the socket buffer as a (fairly complex) linked list of reference-counted
>> blocks, and use scatter-gather IO to the network card.
>
> Doesn't a (modern) kernel do something like that for virtual memory pages ie.?
Possibly. My knowledge of kernel memory management is a lot more patchy
than network stacks.
One option you could investigate, that I was going to suggest in my
original reply but didn't have the time to write up, is the sendfile()
API. If you write your message to a temporary file, you could call
sendfile() on all 100k connections using the same file descriptor. So,
something like:
fd = os.open(PATH, os.O_RDWR)
os.write(fd, message)
os.unlink(PATH)
for connection in biglist:
connection.sendfile(fd, offset=0, len=100)
os.close(fd)
Now, as I understand it, sendfile() will perform zero-copy IO; since the
contents of the file will undoubtedly be in the page cache, it should in
theory DMA the data straight from the (single copy of the) data in RAM
to the NIC buffers.
It should also handle refcounting for you - you unlink the filename
after obtaining a descriptor, and close() the FD once you've called
sendfile, and the kernel *should* in theory free the inode and page
containing file data once all TCP ACKs have been received.
You'll still have to make 100k syscalls, and you may find the kernel
chooses to copy the data anyway.
However - AFAIK Twisted does not support sendfile(), and it can be
tricky to make it work with non-blocking IO.
:o(
You may also want to look at the splice() vmsplice() and tee() syscalls
added to recent Linux kernels. tee() in particular can copy data from
pipe to pipe without consuming, so can be repeated multiple times. It
may be possible to assemble something that will do this task efficiently
from those building blocks, but the APIs aren't available in Twisted.
>> and not useful.
>
> When using VM pages (_if_ that would be possible) and thus no data duplication,
> then why not useful?
Sorry, I should have been more precise - it's probably not often useful.
There are not very many applications where sending the same TCP stream
to that many clients at the same time is helpful - realtime video/audio
over TCP spring to mind, and typically those need to adapt to slow
clients by dropping them to a lower rate i.e. not the same stream any more.
As Glyph has mentioned, encryption is also a factor in todays internet.
I'm kind of curious about what your application is!
More information about the Twisted-Python
mailing list