Overview
This document describes how to use the HTTP client included in Twisted Web. After reading it, you should be able to make HTTP and HTTPS requests using Twisted Web. You will be able to specify the request method, headers, and body and you will be able to retrieve the response code, headers, and body.
Prerequisites
This document assumes that you are familiar with Deferreds and Failures, and producers and consumers. It also assumes you are familiar with the basic concepts of HTTP, such as requests and responses, methods, headers, and message bodies. The HTTPS section of this document also assumes you are somewhat familiar with SSL and have read about using SSL in Twisted.
The Agent
Issuing Requests
The twisted.web.client.Agent
class is the entry
point into the client API. Requests are issued using the request
method, which
takes as parameters a request method, a request URI, the request headers,
and an object which can produce the request body (if there is to be one).
The agent is responsible for connection setup. Because of this, it
requires a reactor as an argument to its initializer. An example of
creating an agent and issuing a request using it might look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from twisted.internet import reactor from twisted.web.client import Agent from twisted.web.http_headers import Headers agent = Agent(reactor) d = agent.request( 'GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example']}), None) def cbResponse(ignored): print 'Response received' d.addCallback(cbResponse) def cbShutdown(ignored): reactor.stop() d.addBoth(cbShutdown) reactor.run()
As may be obvious, this issues a new GET request for /
to the web server on example.com
. Agent
is
responsible for resolving the hostname into an IP address and connecting
to it on port 80 (for HTTP URIs), port 443 (for HTTPS
URIs), or on the port number specified in the URI itself. It is also
responsible for cleaning up the connection afterwards. This code sends
a request which includes one custom header, User-Agent. The
last argument passed to Agent.request
is None
,
though, so the request has no body.
Sending a request which does include a body requires passing an object
providing twisted.web.iweb.IBodyProducer
to Agent.request
. This interface extends the more general
IPushProducer
by adding a new length
attribute and adding several
constraints to the way the producer and consumer interact.
-
The length attribute must be a non-negative integer or the constant
twisted.web.iweb.UNKNOWN_LENGTH
. If the length is known, it will be used to specify the value for the Content-Length header in the request. If the length is unknown the attribute should be set toUNKNOWN_LENGTH
. Since more servers support Content-Length, if a length can be provided it should be. -
An additional method is required on
IBodyProducer
implementations:startProducing
. This method is used to associate a consumer with the producer. It should return aDeferred
which fires when all data has been produced. -
IBodyProducer
implementations should never call the consumer'sunregisterProducer
method. Instead, when it has produced all of the data it is going to produce, it should only fire theDeferred
returned bystartProducing
.
For additional details about the requirements of IBodyProducer
implementations, see
the API documentation.
Here's a simple IBodyProducer
implementation which
writes an in-memory string to the consumer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from zope.interface import implements from twisted.internet.defer import succeed from twisted.web.iweb import IBodyProducer class StringProducer(object): implements(IBodyProducer) def __init__(self, body): self.body = body self.length = len(body) def startProducing(self, consumer): consumer.write(self.body) return succeed(None) def pauseProducing(self): pass def stopProducing(self): pass
This producer can be used to issue a request with a body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
from twisted.internet import reactor from twisted.web.client import Agent from twisted.web.http_headers import Headers from stringprod import StringProducer agent = Agent(reactor) body = StringProducer("hello, world") d = agent.request( 'GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example'], 'Content-Type': ['text/x-greeting']}), body) def cbResponse(ignored): print 'Response received' d.addCallback(cbResponse) def cbShutdown(ignored): reactor.stop() d.addBoth(cbShutdown) reactor.run()
Receiving Responses
So far, the examples have demonstrated how to issue a request. However,
they have ignored the response, except for showing that it is a
Deferred
which seems to fire when the response has been
received. Next we'll cover what that response is and how to interpret
it.
Agent.request
, as with most Deferred
-returning
APIs, can return a Deferred
which fires with a
Failure
. If the request fails somehow, this will be
reflected with a failure. This may be due to a problem looking up the
host IP address, or it may be because the HTTP server is not accepting
connections, or it may be because of a problem parsing the response, or
any other problem which arises which prevents the response from being
received. It does not include responses with an error status.
If the request succeeds, though, the Deferred
will fire with
a Response
. This
happens as soon as all the response headers have been received. It
happens before any of the response body, if there is one, is processed.
The Response
object has several attributes giving the
response information: its code, version, phrase, and headers, as well as
the length of the body to expect. The Response
object also
has a method which makes the response body available: deliverBody
. Using the
attributes of the response object and this method, here's an example which
displays part of the response to a request:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
from pprint import pformat from twisted.internet import reactor from twisted.internet.defer import Deferred from twisted.internet.protocol import Protocol from twisted.web.client import Agent from twisted.web.http_headers import Headers class BeginningPrinter(Protocol): def __init__(self, finished): self.finished = finished self.remaining = 1024 * 10 def dataReceived(self, bytes): if self.remaining: display = bytes[:self.remaining] print 'Some data received:' print display self.remaining -= len(display) def connectionLost(self, reason): print 'Finished receiving body:', reason.getErrorMessage() self.finished.callback(None) agent = Agent(reactor) d = agent.request( 'GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example']}), None) def cbRequest(response): print 'Response version:', response.version print 'Response code:', response.code print 'Response phrase:', response.phrase print 'Response headers:' print pformat(list(response.headers.getAllRawHeaders())) finished = Deferred() response.deliverBody(BeginningPrinter(finished)) return finished d.addCallback(cbRequest) def cbShutdown(ignored): reactor.stop() d.addBoth(cbShutdown) reactor.run()
The BeginningPrinter
protocol in this example is passed to
Response.deliverBody
and the response body is then delivered
to its dataReceived
method as it arrives. When the body has
been completely delivered, the protocol's connectionLost
method is called. It is important to inspect the Failure
passed to connectionLost
. If the response body has been
completely received, the failure will wrap a twisted.web.client.ResponseDone
exception. This
indicates that it is known that all data has been received. It
is also possible for the failure to wrap a twisted.web.http.PotentialDataLoss
exception: this
indicates that the server framed the response such that there is no way
to know when the entire response body has been received. Only
HTTP/1.0 servers should behave this way. Finally, it is possible for
the exception to be of another type, indicating guaranteed data loss for
some reason (a lost connection, a memory error, etc).
Just as protocols associated with a TCP connection are given a transport,
so will be a protocol passed to deliverBody
. Since it makes
no sense to write more data to the connection at this stage of the
request, though, the transport only provides IPushProducer
. This allows the
protocol to control the flow of the response data: a call to the
transport's pauseProducing
method will pause delivery; a
later call to resumeProducing
will resume it. If it is
decided that the rest of the response body is not desired,
stopProducing
can be used to stop delivery permanently;
after this, the protocol's connectionLost
method will be
called.
An important thing to keep in mind is that the body will only be read
from the connection after Response.deliverBody
is called.
This also means that the connection will remain open until this is done
(and the body read). So, in general, any response with a body
must have that body read using deliverBody
. If the
application is not interested in the body, it should issue a
HEAD request or use a protocol which immediately calls
stopProducing
on its transport.
HTTP over SSL
Everything you've read so far applies whether the scheme of the request URI is HTTP or HTTPS. However, to control the SSL negotiation performed when an HTTPS URI is requested, there's one extra object to pay attention to: the SSL context factory.
Agent
's constructor takes an optional second argument, a
context factory. This is an object like the context factory described
in Using SSL in Twisted but has
one small difference. The getContext
method of this factory
accepts the address from the URL being requested. This allows it to
return a context object which verifies that the server's certificate
matches the URL being requested.
Here's an example which shows how to use Agent
to request
an HTTPS URL with no certificate verification.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
from twisted.python.log import err from twisted.web.client import Agent from twisted.internet import reactor from twisted.internet.ssl import ClientContextFactory class WebClientContextFactory(ClientContextFactory): def getContext(self, hostname, port): return ClientContextFactory.getContext(self) def display(response): print "Received response" print response def main(): contextFactory = WebClientContextFactory() agent = Agent(reactor, contextFactory) d = agent.request("GET", "https://example.com/") d.addCallbacks(display, err) d.addCallback(lambda ignored: reactor.stop()) reactor.run() if __name__ == "__main__": main()
The important point to notice here is that getContext
now
accepts two arguments, a hostname and a port number. These two arguments,
a str
and an int
, give the address to which a
connection is being established to request an HTTPS URL. Because an agent
might make multiple requests over a single connection,
getContext
may not be called once for each request. A second
or later request for a URL with the same hostname as a previous request
may re-use an existing connection, and therefore will re-use the
previously returned context object.
To configure SSL options or enable certificate verification or hostname checking, provide a context factory which creates suitably configured context objects.
Conclusion
You should now understand the basics of the Twisted Web HTTP client. In particular, you should understand:
- How to issue requests with arbitrary methods, headers, and bodies.
- How to access the response version, code, phrase, headers, and body.
- How to control the streaming of the response body.