[Twisted-Python] Twisted HTTP client supporting failover for multiple A records?
Luke Marsden
luke-lists at hybrid-logic.co.uk
Thu Jul 15 06:33:41 MDT 2010
On Thu, 2010-07-15 at 08:06 -0400, Itamar Turner-Trauring wrote:
> On Thu, 2010-07-15 at 10:46 +0100, Reza Lotun wrote:
>
> > As for connecting to hosts that resolve to multiple A records - I
> > presume as a means of load balancing via DNS round robin
We're actually using it to provide redundancy in this instance. In our
application any request for any site can be made to any (live) server,
so having dead servers in the pool of A records doesn't matter so long
as real web browsers failover to some other A record within a second,
which they do! http://crypto.stanford.edu/dns/dns-rebinding.pdf
The problem is that my test application uses client.getPage which,
because it uses the reactor's standard DNS lookup mechanism, picks just
one A record and sticks to it. So, it reports connection errors (some
fraction of the time, as A records are randomised) even when the user of
a "real" web browser would not experience them. These errors go away
when the dead server(s) drop out of the DNS pool and reactor's lookups
stops returning the dead IP, but this takes some time.
> Gar. I should read better. Twisted uses a threadpool of gethostname by
> default, but you can plug in your own resolver (e.g. you can use
> twisted.names):
>
> http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.interfaces.IReactorPluggableResolver.html#installResolver
>
> The question is whether the client code re-resolves on each re-connect,
> and whether the current lookup interface is sufficient for this use
> case.
>
> Alas, I'm pretty sure the answer is no.
>
> You could however always just do the DNS lookup yourself, passing
> resulting correct IP to connectTCP, just make sure you don't block (e.g.
> by using deferToThread to call gethostbyname_ex).
Thanks Itamar, this is massively useful. I'll try subclassing
twisted.web.client.Agent to do its own DNS lookups with twisted.names so
as to be aware of the full list of A records returned. It would then
attempt all the IP addresses in turn until it finds one which works,
giving up only if all the IPs yield connection errors. This should
mirror the behaviour of the majority of web browsers "in the wild".
Would you be interested in having this code contributed back to Twisted
if I can get it working? It might be a useful addition to the Agent.
--
Best Regards,
Luke Marsden
Hybrid Logic Ltd.
Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS
Mobile: +447791750420
More information about the Twisted-Python
mailing list