[Twisted-web] Performance of twisted web with Quixote [was
Performance of twisted web with HTTP/1.1 vs. HTTP/1.0]
Jason E. Sibre
jsibre at sibre.org
Wed Apr 14 21:37:19 MDT 2004
Hi folks,
I previously wrote to this list about a performance problem I was having
with Twisted, Quixote, and (I thought) HTTP/1.1, which I erroneously thought
was a problem in Twisted's ability to deal with HTTP/1.1...
I've since spent lots of time digging, and first figured out that the
problem wasn't really in Twisted (and it really didn't have anything to do
with HTTP/1.1, though persistent connections did contribute. More
accurately, the lack of persistent connections would mask the problem.), and
then eventually figured out what the problem REALLY was.
It was an odd little thing that had to do with Linux, Windows, network
stacks, slow ACKs, and sending more packets than were needed. Well, I don't
want to go into much more detail, because your time is valuable.
First, for those that haven't heard of it, Quixote is a python based web
publishing framework that doesn't include a web server. Instead, it can be
published through a number of mechanisms: CGI, FastCGI, SCGI, or
mod_python, plus it has interfaces for Twisted and Medusa. I think I may be
missing one, but I'm not sure. It's home page is at
http://www.mems-exchange.org/software/quixote/
We (the quixote-users folks) seem to have a lack of expertise in Twisted :)
The interface between twisted and quixote: A twisted request object is used
to create a quixote request object, quixote is called to publish the
request, and then the output of quixote is wrapped into a producer which
twisted then finishes handling. Actually, that's how it has been for quite
some time, except for the producer bit. My modifications revolved around
creating the producer class that (I think/hope) works well in the Twisted
framework, and let's twisted publish it when it's ready (i.e., in it's event
loop). Formerly, quixote's output was just pushed out through the twisted
request object's write() method. Which could cause REALLY bad performance;
the bug I was chasing. In many cases it did just fine, however. This was
also just a generally bad idea, because, for instance, publishing a large
file could consume large amounts of RAM until it was done being pushed over
the wire.
It's also worth mentioning that a quixote Stream object (noticable in the
source) is a producer, but it uses the iterator protocol instead of .more()
or resumeProducing().
I'm hoping that someone can take a look at the finished product (just the
interface module) and say something like, "you're nuts! you're doing this
all wrong!", or "yeah, this looks like the right general idea, except maybe
this bit here...".
Also, if anyone can share a brief one-liner or two about whether or not I
should leave in the hooks for pb and threadable, I'd appreciate it (quixote
is almost always run single threaded... Maybe just always...). I also
changed the demo/test code at the bottom of the module from using the
Application object to using the reactor. I'd appreciate any feedback on
that and the SSL code (it's also new...) as well.
If anyone should want to actually run this, it'll work with Quixote-1.0b1,
and the previous 'stable' (I say that because it was the latest version for
several months...) version 0.7a3. I wrote the interface against twisted
1.2.0, but I think it'll work with older versions. I just don't know how
old. Oh, and if you wanna drop it in a quixote install, it lives as
quixote.server.twisted_http
Thanks in advance for any help,
Jason Sibre
-------------- next part --------------
#!/usr/bin/env python
"""
twist -- Demo of an HTTP server built on top of Twisted Python.
"""
__revision__ = "$Id: medusa_http.py 21221 2003-03-20 16:02:41Z akuchlin $"
# based on qserv, created 2002/03/19, AMK
# last mod 2003.03.24, Graham Fawcett
# tested on Win32 / Twisted 0.18.0 / Quixote 0.6b5
#
# version 0.2 -- 2003.03.24 11:07 PM
# adds missing support for session management, and for
# standard Quixote response headers (expires, date)
#
# modified 2004/04/10 jsibre
# better support for Streams
# wraps output (whether Stream or not) into twisted type producer.
# modified to use reactor instead of Application (Appication
# has been deprecated)
import urllib
from twisted.protocols import http
from twisted.web import server
from quixote.http_response import Stream
# Imports for the TWProducer object
from twisted.spread import pb
from twisted.python import threadable
from twisted.internet import abstract
class QuixoteTWRequest(server.Request):
def process(self):
self.publisher = self.channel.factory.publisher
environ = self.create_environment()
## this seek is important, it doesnt work without it
## (It doesn't matter for GETs, but POSTs will not
## work properly without it.)
self.content.seek(0,0)
qxrequest = self.publisher.create_request(self.content, environ)
self.quixote_publish(qxrequest, environ)
resp = qxrequest.response
self.setResponseCode(resp.status_code)
for hdr, value in resp.generate_headers():
self.setHeader(hdr, value)
if resp.body is not None:
TWProducer(resp.body, self)
else:
self.finish()
def quixote_publish(self, qxrequest, env):
"""
Warning, this sidesteps the Publisher.publish method,
Hope you didn't override it...
"""
pub = self.publisher
output = pub.process_request(qxrequest, env)
# don't write out the output, just set the response body
# the calling method will do the rest.
if output:
qxrequest.response.set_body(output)
pub._clear_request()
def create_environment(self):
"""
Borrowed heavily from twisted.web.twcgi
"""
# Twisted doesn't decode the path for us,
# so let's do it here. This is also
# what medusa_http.py does, right or wrong.
if '%' in self.path:
self.path = urllib.unquote(self.path)
serverName = self.getRequestHostname().split(':')[0]
env = {"SERVER_SOFTWARE": server.version,
"SERVER_NAME": serverName,
"GATEWAY_INTERFACE": "CGI/1.1",
"SERVER_PROTOCOL": self.clientproto,
"SERVER_PORT": str(self.getHost()[2]),
"REQUEST_METHOD": self.method,
"SCRIPT_NAME": '',
"SCRIPT_FILENAME": '',
"REQUEST_URI": self.uri,
"HTTPS": (self.isSecure() and 'on') or 'off',
}
client = self.getClient()
if client is not None:
env['REMOTE_HOST'] = client
ip = self.getClientIP()
if ip is not None:
env['REMOTE_ADDR'] = ip
xx, xx, remote_port = self.transport.getPeer()
env['REMOTE_PORT'] = remote_port
env["PATH_INFO"] = self.path
qindex = self.uri.find('?')
if qindex != -1:
env['QUERY_STRING'] = self.uri[qindex+1:]
else:
env['QUERY_STRING'] = ''
# Propogate HTTP headers
for title, header in self.getAllHeaders().items():
envname = title.replace('-', '_').upper()
if title not in ('content-type', 'content-length'):
envname = "HTTP_" + envname
env[envname] = header
return env
class TWProducer(pb.Viewable):
"""
A class to represent the transfer of data over the network.
JES Note: This has more stuff in it than is minimally neccesary.
However, since I'm no twisted guru, I built this by modifing
twisted.web.static.FileTransfer. FileTransfer has stuff in it
that I don't really understand, but know that I probably don't
need. I'm leaving it in under the theory that if anyone ever
needs that stuff (e.g. because they're running with multiple
threads) it'll be MUCH easier for them if I had just left it in
than if they have to figure out what needs to be in there.
Furthermore, I notice no performance penalty for leaving it in.
"""
request = None
def __init__(self, data, request):
self.request = request
self.data = ""
self.size = 0
self.stream = None
self.streamIter = None
self.outputBufferSize = abstract.FileDescriptor.bufferSize
if isinstance(data, Stream): # data could be a Stream
self.stream = data
self.streamIter = iter(data)
self.size = data.length
elif data: # data could be a string
self.data = data
self.size = len(data)
else: # data could be None
# We'll just leave self.data as ""
pass
request.registerProducer(self, 0)
def resumeProducing(self):
"""
This is twisted's version of a producer's '.more()', or
an iterator's '.next()'. That is, this function is
responsible for returning some content.
"""
if not self.request:
return
if self.stream:
# If we were provided a Stream, let's grab some data
# and push it into our data buffer
buffer = [self.data]
bytesInBuffer = len(buffer[-1])
while bytesInBuffer < self.outputBufferSize:
try:
buffer.append(self.streamIter.next())
bytesInBuffer += len(buffer[-1])
except StopIteration:
# We've exhausted the Stream, time to clean up.
self.stream = None
self.streamIter = None
break
self.data = "".join(buffer)
if self.data:
chunkSize = min(self.outputBufferSize, len(self.data))
data, self.data = self.data[:chunkSize], self.data[chunkSize:]
else:
data = ""
if data:
self.request.write(data)
if not self.data:
self.request.unregisterProducer()
self.request.finish()
self.request = None
def pauseProducing(self):
pass
def stopProducing(self):
self.data = ""
self.request = None
self.stream = None
self.streamIter = None
# Remotely relay producer interface.
def view_resumeProducing(self, issuer):
self.resumeProducing()
def view_pauseProducing(self, issuer):
self.pauseProducing()
def view_stopProducing(self, issuer):
self.stopProducing()
synchronized = ['resumeProducing', 'stopProducing']
threadable.synchronize(TWProducer)
class QuixoteFactory (http.HTTPFactory):
def __init__(self, publisher):
self.publisher = publisher
http.HTTPFactory.__init__(self, None)
def buildProtocol (self, addr):
p = http.HTTPFactory.buildProtocol(self, addr)
p.requestFactory = QuixoteTWRequest
return p
def run ():
from twisted.internet import reactor
from quixote import enable_ptl
from quixote.publish import Publisher
enable_ptl()
import quixote.demo
# Port this server will listen on
http_port = 8080
namespace = quixote.demo
# If you want SSL, make sure you have OpenSSL,
# uncomment the follownig, and uncomment the
# listenSSL() call below.
##from OpenSSL import SSL
##class ServerContextFactory:
## def getContext(self):
## ctx = SSL.Context(SSL.SSLv23_METHOD)
## ctx.use_certificate_file('/path/to/pem/encoded/ssl_cert_file')
## ctx.use_privatekey_file('/path/to/pem/encoded/ssl_key_file')
## return ctx
publisher = Publisher(namespace)
##publisher.setup_logs()
qf = QuixoteFactory(publisher)
reactor.listenTCP(http_port, qf)
##reactor.listenSSL(http_port, qf, ServerContextFactory())
reactor.run()
if __name__ == '__main__':
run()
More information about the Twisted-web
mailing list