[Twisted-Python] getPage using ssl proxy

Thu Jul 30 13:15:31 MDT 2009

Hello,

I am writing some scraper scripts and need to pass them through an
intercepting proxy. getPage does not support a proxy argument and this code
I found on internet won't work with SSL proxy (stalls indefinitely):

def getPage(url, contextFactory=None, *args, **kwargs):
    scheme, host, port, path = _parse(url)
    factory = HTTPClientFactory(url, *args, **kwargs)
    if 0: # use a proxy
        host, port = 'localhost', 8080
        factory.path = url
    if scheme == 'https':
        from twisted.internet import ssl
        if contextFactory is None:
            contextFactory = ssl.ClientContextFactory()
        reactor.connectSSL(host, port, factory, contextFactory)
    else:
        reactor.connectTCP(host, port, factory)
    return factory.deferred

Plain http proxying works. My guess is that there is an issue with
self-signed or otherwise invalid certificate the http proxy supplies. Any
clues?

--
Konrads Smelkovs
Applied IT sorcery.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20090730/16e6f04a/attachment.html>