Over the last few years at SpiderOak we've continuously directed our efforts to improve our Release infrastructure. We have covered everything from our repository workflows to the way we deploy our services, artifacts and deliverables both to the public and internally (to our QA team and the staging environments). This post will describe the transformation suffered over this last years, and the current status of our build, release, and continuous integration infrastructure.
Our build infrastructure used to be powered by Buildbot, an open-source Python-based CI/testing framework with more than 15 years of history. We owe years of builds to it. Back when SpiderOak ONE was our only product, one single Build Master and a few Build Slaves were enough to automatically handle our build infrastructure. However, as SpiderOak grew, so did our requirements for building and automatic testing: first came Semaphor, then Share appeared too, our repositories started multiplying and things became much harder to handle.
When we first created Semaphor, more than 20 repositories were part of the build process. When new features were being developed across several repositories, attempting to automatically test the right combination of branches proved to be impossible. In addition to that, since we now had many different artifacts related to the Semaphor backend that had to be built (the Flow server, many service bots, binaries used specifically for testing), our code for building became bigger and more complex. Since Buildbot config files (that are essentially Python) cover everything from the checkout of each repository to the HTTP server used to display the results, the system became pretty hard to maintain. Finally, the web UI was pretty outdated, since responsive UI was only a beta with many bugs in Buildbot by that time.
A wild CI system appears
We decided it was time for us to migrate to a different CI system that would help us tackle the issues we were having with Buildbot. Many excellent options were considered: Jenkins, Travis, CircleCI, TeamCity and CruiseControl among them. In the end, due to the transparent integration with the rest of our Atlassian stack (Jira, Bitbucket, and Confluence) and the ease of use of multiple repositories, we went with Bamboo.
During all of 2018 we've been implementing many features and experimenting with different ways of pushing the limits of our Continuous Integration setup, which vary a lot depending on the project. Some projects simply required having scripted autotests passing before merging. More complex ones build and tag multiple Docker images, orchestrate containers for testing purposes, interact with servers and databases, and even deploy deliverables directly to our website, our Linux repositories and the App and Google Play Stores.
Migrating our basic build systems of our products was a reasonably easy task, and after a month we were already building all of our current products in an automated way in Bamboo. Afterwards we started building new features in our infrastructure.
Every project at SpiderOak has its own autotesting policies, but in general every Pull Request created in a repository triggers a full suite of automatic tests and builds that have to be passing in order to merge the PR. Part of the benefit of using so much of the Atlassian stack is that our Jira tickets have the corresponding development branches attached to them, along with the results of the Bamboo tests and builds. This helps us to quickly see the results of the work done by devs, gives quick access to builds that need to be used to (manually) test a feature to our QA team, and in general results in a good consolidation of our planning, development, and testing cycles. Some projects even run quick automatic tests and linters for every single commit pushed in the repository, whether they are part of a Pull Request or not.
Docker images manipulation
The Ops team at SpiderOak has had its own effort migrating our services to Kubernetes, and as a result our teams started to deliver their work in Docker images so we could deploy these services more efficiently. Our CI system now helps us by giving us a centralized place to manipulate these Docker images: when a CI run is triggered (usually on a Pull Request), the target Docker images are tagged with the git commit SHAs and pushed to our repositories. In addition, whenever we decide a Docker image is ready to deploy, we have manually-triggered build steps that will tag the Docker images with a Production tag; the Ops team just needs to restart the Kubernetes pod that runs the Production image to have our new services deployed now! We do this at many levels: some services are deployed first to a Staging environment, then tested by our QA team, and then promoted to the Production, and all is controlled by manually-triggered actions in our Bamboo UI.
We even do some Bamboo inception by building the Docker image that runs our Bamboo server in our current Bamboo instance, then promote it to Production when we want to upgrade the server version, and then simply restart the Kubernetes pod to see the new version running!
Build registration/Continuous Delivery
Our public installer downloads are manipulated by a Django Release Widget, a tool that allows us to select a build and push it to our download sections in the website automatically. CI helps us by registering each build that has passed all automatic tests in this Release Widget's database, tagging the product, the installer version, and its SHASUM. Whenever we release a new version of our products, we just need to select the installers in the Release Widget and push them publicly. Major product releases are always done like this. Furthermore, there's Continuous Delivery in place in the pipeline: when builds are created from a stable set of branches in the involved repositories, they are also automatically pushed to our Alpha download channels, allowing the QA team to always have a "latest" build with all available features baked in to test.
Linux Repository handling
Pushing builds to our Linux repositories is also handled by manually-triggered actions in our CI pipeline. When installers pass all automatic tests, a manually-triggered step is made available that allows us to update the installer version and builds in our Linux repo. This way, neither installers released via web nor in repositories require Ops intervention to be pushed publicly.
Some of our products have mobile versions of the clients available; since mobile distribution can't be handled directly by us, because builds need to pass through the Apple and Google Play stores and be evaluated, we can't simply push the apps to a download channel. However, thanks to the Fastlane Tools (https://fastlane.tools/), we do have manually-triggered steps available for our test-passing apps that allow us to push new builds to said stores automatically, and our QA team tests the new builds from the beta stores once these are available.
The current CI pipeline has been a huge improvement for us, but of course there's a lot of room for more: we are in the process of migrating our build infrastructure itself to have it managed completely by Kubernetes, by dockerizing our build tools and having a cleaner setup of the build hosts. We will also be working on expanding our autotesting code coverage, and further automating deployments and releases!