Using the Twisted Web Client

  1. Overview
  2. The Agent
  3. Conclusion

Overview

This document describes how to use the HTTP client included in Twisted Web. After reading it, you should be able to make HTTP and HTTPS requests using Twisted Web. You will be able to specify the request method, headers, and body and you will be able to retrieve the response code, headers, and body.

Prerequisites

This document assumes that you are familiar with Deferreds and Failures, and producers and consumers. It also assumes you are familiar with the basic concepts of HTTP, such as requests and responses, methods, headers, and message bodies. The HTTPS section of this document also assumes you are somewhat familiar with SSL and have read about using SSL in Twisted.

The Agent

Issuing Requests

The twisted.web.client.Agent class is the entry point into the client API. Requests are issued using the request method, which takes as parameters a request method, a request URI, the request headers, and an object which can produce the request body (if there is to be one). The agent is responsible for connection setup. Because of this, it requires a reactor as an argument to its initializer. An example of creating an agent and issuing a request using it might look like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

from twisted.internet import reactor from twisted.web.client import Agent from twisted.web.http_headers import Headers agent = Agent(reactor) d = agent.request( 'GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example']}), None) def cbResponse(ignored): print 'Response received' d.addCallback(cbResponse) def cbShutdown(ignored): reactor.stop() d.addBoth(cbShutdown) reactor.run()
Issue a request with an Agent - listings/client/request.py

As may be obvious, this issues a new GET request for / to the web server on example.com. Agent is responsible for resolving the hostname into an IP address and connecting to it on port 80 (for HTTP URIs), port 443 (for HTTPS URIs), or on the port number specified in the URI itself. It is also responsible for cleaning up the connection afterwards. This code sends a request which includes one custom header, User-Agent. The last argument passed to Agent.request is None, though, so the request has no body.

Sending a request which does include a body requires passing an object providing twisted.web.iweb.IBodyProducer to Agent.request. This interface extends the more general IPushProducer by adding a new length attribute and adding several constraints to the way the producer and consumer interact.

For additional details about the requirements of IBodyProducer implementations, see the API documentation.

Here's a simple IBodyProducer implementation which writes an in-memory string to the consumer:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

from zope.interface import implements from twisted.internet.defer import succeed from twisted.web.iweb import IBodyProducer class StringProducer(object): implements(IBodyProducer) def __init__(self, body): self.body = body self.length = len(body) def startProducing(self, consumer): consumer.write(self.body) return succeed(None) def pauseProducing(self): pass def stopProducing(self): pass
A string-based body producer. - listings/client/stringprod.py

This producer can be used to issue a request with a body:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

from twisted.internet import reactor from twisted.web.client import Agent from twisted.web.http_headers import Headers from stringprod import StringProducer agent = Agent(reactor) body = StringProducer("hello, world") d = agent.request( 'GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example'], 'Content-Type': ['text/x-greeting']}), body) def cbResponse(ignored): print 'Response received' d.addCallback(cbResponse) def cbShutdown(ignored): reactor.stop() d.addBoth(cbShutdown) reactor.run()
Issue a request with a body. - listings/client/sendbody.py

Receiving Responses

So far, the examples have demonstrated how to issue a request. However, they have ignored the response, except for showing that it is a Deferred which seems to fire when the response has been received. Next we'll cover what that response is and how to interpret it.

Agent.request, as with most Deferred-returning APIs, can return a Deferred which fires with a Failure. If the request fails somehow, this will be reflected with a failure. This may be due to a problem looking up the host IP address, or it may be because the HTTP server is not accepting connections, or it may be because of a problem parsing the response, or any other problem which arises which prevents the response from being received. It does not include responses with an error status.

If the request succeeds, though, the Deferred will fire with a Response. This happens as soon as all the response headers have been received. It happens before any of the response body, if there is one, is processed. The Response object has several attributes giving the response information: its code, version, phrase, and headers, as well as the length of the body to expect. The Response object also has a method which makes the response body available: deliverBody. Using the attributes of the response object and this method, here's an example which displays part of the response to a request:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

from pprint import pformat from twisted.internet import reactor from twisted.internet.defer import Deferred from twisted.internet.protocol import Protocol from twisted.web.client import Agent from twisted.web.http_headers import Headers class BeginningPrinter(Protocol): def __init__(self, finished): self.finished = finished self.remaining = 1024 * 10 def dataReceived(self, bytes): if self.remaining: display = bytes[:self.remaining] print 'Some data received:' print display self.remaining -= len(display) def connectionLost(self, reason): print 'Finished receiving body:', reason.getErrorMessage() self.finished.callback(None) agent = Agent(reactor) d = agent.request( 'GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example']}), None) def cbRequest(response): print 'Response version:', response.version print 'Response code:', response.code print 'Response phrase:', response.phrase print 'Response headers:' print pformat(list(response.headers.getAllRawHeaders())) finished = Deferred() response.deliverBody(BeginningPrinter(finished)) return finished d.addCallback(cbRequest) def cbShutdown(ignored): reactor.stop() d.addBoth(cbShutdown) reactor.run()
Inspect the response. - listings/client/response.py

The BeginningPrinter protocol in this example is passed to Response.deliverBody and the response body is then delivered to its dataReceived method as it arrives. When the body has been completely delivered, the protocol's connectionLost method is called. It is important to inspect the Failure passed to connectionLost. If the response body has been completely received, the failure will wrap a twisted.web.client.ResponseDone exception. This indicates that it is known that all data has been received. It is also possible for the failure to wrap a twisted.web.http.PotentialDataLoss exception: this indicates that the server framed the response such that there is no way to know when the entire response body has been received. Only HTTP/1.0 servers should behave this way. Finally, it is possible for the exception to be of another type, indicating guaranteed data loss for some reason (a lost connection, a memory error, etc).

Just as protocols associated with a TCP connection are given a transport, so will be a protocol passed to deliverBody. Since it makes no sense to write more data to the connection at this stage of the request, though, the transport only provides IPushProducer. This allows the protocol to control the flow of the response data: a call to the transport's pauseProducing method will pause delivery; a later call to resumeProducing will resume it. If it is decided that the rest of the response body is not desired, stopProducing can be used to stop delivery permanently; after this, the protocol's connectionLost method will be called.

An important thing to keep in mind is that the body will only be read from the connection after Response.deliverBody is called. This also means that the connection will remain open until this is done (and the body read). So, in general, any response with a body must have that body read using deliverBody. If the application is not interested in the body, it should issue a HEAD request or use a protocol which immediately calls stopProducing on its transport.

HTTP over SSL

Everything you've read so far applies whether the scheme of the request URI is HTTP or HTTPS. However, to control the SSL negotiation performed when an HTTPS URI is requested, there's one extra object to pay attention to: the SSL context factory.

Agent's constructor takes an optional second argument, a context factory. This is an object like the context factory described in Using SSL in Twisted but has one small difference. The getContext method of this factory accepts the address from the URL being requested. This allows it to return a context object which verifies that the server's certificate matches the URL being requested.

Here's an example which shows how to use Agent to request an HTTPS URL with no certificate verification.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

from twisted.python.log import err from twisted.web.client import Agent from twisted.internet import reactor from twisted.internet.ssl import ClientContextFactory class WebClientContextFactory(ClientContextFactory): def getContext(self, hostname, port): return ClientContextFactory.getContext(self) def display(response): print "Received response" print response def main(): contextFactory = WebClientContextFactory() agent = Agent(reactor, contextFactory) d = agent.request("GET", "https://example.com/") d.addCallbacks(display, err) d.addCallback(lambda ignored: reactor.stop()) reactor.run() if __name__ == "__main__": main()

The important point to notice here is that getContext now accepts two arguments, a hostname and a port number. These two arguments, a str and an int, give the address to which a connection is being established to request an HTTPS URL. Because an agent might make multiple requests over a single connection, getContext may not be called once for each request. A second or later request for a URL with the same hostname as a previous request may re-use an existing connection, and therefore will re-use the previously returned context object.

To configure SSL options or enable certificate verification or hostname checking, provide a context factory which creates suitably configured context objects.

Conclusion

You should now understand the basics of the Twisted Web HTTP client. In particular, you should understand:

  • How to issue requests with arbitrary methods, headers, and bodies.
  • How to access the response version, code, phrase, headers, and body.
  • How to control the streaming of the response body.

Index

Version: 10.2.0 Site Meter