[Twisted-Python] Waiting time for tests running on Travis CI and Buildbot

Glyph Lefkowitz glyph at twistedmatrix.com
Sun Aug 14 17:10:13 MDT 2016


> On Aug 14, 2016, at 3:38 AM, Adi Roiban <adi at roiban.ro> wrote:
> 
> Hi,
> 
> We now have 5 concurrent jobs on Travis-CI for the whole Twisted organization.
> 
> If we want to reduce the waste of running push tests for a PR we
> should check that the other repos from the Twisted organization are
> doing the same.
> 
> We now have in twisted/twisted 9 jobs per each build ... and for each
> push to a PR ... we run the tests for the push and for the PR merge...
> so those are 18 jobs for a commit.
> 
> twisted/mantisa has 7 jobs per build, twisted/epsilon 3 jobs per
> build, twisted/nevow 14 jobs, twisted/axiom 6 jobs, twisted/txmongo 16
> jobs
> 
> .... so we are a bit over the limit of 5 jobs

Well, we're not "over the limit".  It's just 5 concurrent.  Most of the projects that I work on have more than 5 entries in their build matrix.

> I have asked Travis-CI how we can improve the waiting time for
> twisted/twisted jobs and for $6000 per year they can give us 15
> concurrent jobs for the Twisted organization.
> 
> This will not give us access to a faster waiting line for the OSX jobs.
> 
> Also, I don't think that we can have twisted/twisted take priority
> inside the organization.
> 
> If you think that we can raise $6000 per year for sponsoring our
> Travis-CI and that is worth increasing the queue size I can follow up
> with Travis-CI.

I think that this is definitely worth doing.

> I have also asked Circle CI for a free ride on their OSX builders, but
> it was put on hold as Glyph told me that Circe CI is slower than
> Travis.
> 
> I have never used Circle CI. If you have a good experience with OSX on
> Circle CI I can continue the phone interview with Circle Ci so that we
> get the free access and see how it goes.

The reason I'm opposed to Circle is simply that their idiom for creating a build matrix is less parallelism-friendly than Travis.  Travis is also more popular, so more contributors will be interested in participating.

> There are multiple ways in which we can improve the time a test takes
> to run on Travis-CI, but it will never be faster than buildbot with a
> slave which is always active and ready to start a job in 1 second and
> which already has 99% of the virtualev dependencies already installed.

There's a lot that we can do to make Travis almost that fast, with pre-built Docker images and cached dependencies.  We haven't done much in the way of aggressive optimization yet.  As recently discussed we're still doing twice as many builds as we need to just because we've misconfigured branch / push builds :).

> AFAIK the main concern with buildot, is that the slaves are always
> running so a malicious person could create a PR with some malware and
> then all our slaves will execute that malware.

Not only that, but the security between the buildmaster and the builders themselves is weak.  Now that we have the buildmaster on a dedicated machine, this is less of a concern, but it still has access to a few secrets (an SSL private key, github oauth tokens) which we would rather not leak if we can avoid it.

> One way to mitigate this, is to use latent buildslaves and stop and
> reset a slave after each build, but this will also slow the build and
> lose the virtualenv ... which of docker based slave should not be a
> problem... but if we want Windows latent slaves it might increase the
> build time.

It seems like fully latent slaves would be slower than Travis by a lot, since Travis is effectively doing the same thing, but they have a massive economy of scale with pre-warmed pre-booted VMs that they can keep in a gigantic pool and share between many different projects.

> What do you say if we protect our buildslaves with a firewall which
> only allows outgoing connections to buildmaster and github ... and
> have the slaves running only on RAX + Azure to simplify the firewall
> configuration?
> 
> Will a malicious person still be interested of exploiting the slaves?
> 
> I would be happy to help with buildbot configuration as I think that
> for TDD, buildbot_try with slaves which are always connected and
> virtualenv already created is the only acceptable CI system.


Personally, I just want to stop dealing with so much administrative overhead.  I am willing to wait for slightly longer build times in order to do that.  Using Travis for everything means we don't need to worry about these issues, or have these discussions; we can just focus on developing Twisted, and have sponsors throw money at the problem.  There's also the issue that deploying new things to the buildmaster will forever remain a separate permission level, but proposing changes to the travis configuration just means sending a PR.

There are things we could do to reduce both overhead and the risk impact further though.  For example, we could deploy buildbot as a docker container instead of as a VM, making it much faster to blow away the VM if we have a security problem, and limiting its reach even more.

On the plus side, it would be nice to be dogfooding Twisted as part of our CI system, and buildbot uses it heavily.  So while I think overall the tradeoffs are in favor of travis, I wouldn't say that I'm 100% in favor.  And _most_ of the things we used to need a buildbot config change for now are just tox environment changes; if we can move to 99% of the job configuration being in tox.ini as opposed to the buildmaster, that would eliminate another objection.

I'd also be interested in hearing from as many contributors as possible about this though.  The point of all of this is to make contributing easier and more enjoyable, so the majority opinion is kind of important here :).

-glyph





More information about the Twisted-Python mailing list