Kubernetes on Bare Metal: A Journey from Kuber-naught to Kuber-niceties

Since SpiderOak was founded way back in 2007, our datacenter needs have grown significantly. From day one, we have hosted all of our own infrastructure. Growing from a single pizza box server to hundreds of machines with thousands of hard drives has presented us with lots of challenges, but has enabled us to stay competitive with the economies of long-term data archival.

One particular challenge we have had to solve (more than once) is server provisioning. We've run the gamut from hand-cultivated pet servers, to homegrown deployment scripts, to deployment automation systems like Ansible. Once our Ansible playbooks hit maturity, managing all of our machines became much easier, but it still had some pretty glaring shortcomings. I will expand on these in a later post, but the focus of this article is on our foray into the world of Kubernetes. Suffice it to say for now, that we wanted to manage our infrastructure declaratively, and have strong automated tooling to drive the state of the datacenter toward our description of it. For this purpose, Kubernetes seemed like a natural fit.

Doing Things the Hard Way

After deciding to migrate our infrastructure to Kubernetes, we set out to find the best way to deploy Kubernetes itself within our datacenter. At the time, not much literature existed on running Kubernetes on bare metal, as the community is much more focused on cloud providers like AWS and GCE. The existing deployment strategies, such as kops and kubeadm, either didn't support deploying on bare metal, or didn't result in a production-ready cluster out of the box. In an effort to educate ourselves on what actually goes into a Kubernetes cluster, we turned to Kelsey Hightower's excellent tutorial, Kubernetes the Hard Way.

From there, we learned about the control plane: etcd, apiserver, kubelet, etc. We also learned about container runtimes, networking plugins, just how many TLS certificates we needed to generate (and eventually renew), multi-master/high-availability, and the benefits and drawbacks of a self-hosted control plane. We started forming ideas about the "right way" to deploy a cluster, and the apparent lack of deployment tools that captured our vision resulted in a new (ill-advised) Ansible role.

After deploying a few test clusters in our lab using our Ansible playbook, it became clear that we only had an 80% solution (if that.) We could either continue down this road, and after significant development effort eventually come up with a complete solution, or we could go back to the community and see what other serious, production users we could find.

Throw Money at the Problem

The goal was to run production services on this thing, and as a privacy-focused company, we put the security of our infrastructure high on our list of priorities. Deployment strategies targeted at "the enterprise" seemed like the only way forward. At the top of that list is CoreOS's Tectonic. Tectonic wowed us with easy-to-follow documentation for deploying on bare-metal, and out-of-the-box bells and whistles like single-sign-on (which we easily integrated with our internal LDAP), automatic OS updates, and a dashboard and monitoring solution. We actually deployed Tectonic and used it for production services for a while, and it worked well.

After thinking more about the bigger picture, which ultimately involves migrating not only many different services, but a handful of different classes of service with very different physical resource requirements, the niceties provided by Tectonic started to seem less like a one-size-fits-all solution, and more like a potential future headache.

Minimalism

We were very pleased to stumble across Typhoon at this point, which takes some of our favorite ideas from Tectonic, and provides a minimalistic, yet still production-ready, Kubernetes distribution that tracks closely with upstream releases. Like Tectonic, Typhoon uses Terraform and Matchbox for PXE-booting machines into working clusters. It will deploy clusters of any size, with multi-master, self-hosted (mostly) control planes, and TLS out of the box. These are pretty much exactly the features we considered to be the minimum necessary for our purposes, so Typhoon ended up being a great fit for us. The bells and whistles that Tectonic provides, like dashboards, SSO, and monitoring, are easily added onto a Typhoon cluster. Having solved those problems the Kubernetes Way, we can now pretty much deploy them to any cluster we spin up, with a single command.

One of the philosophies espoused by Typhoon is to run many Kubernetes clusters. Upgrading is equivalent to spinning up a new cluster at the new version, migrating your workload over, and shutting down the old cluster. Canary and blue/green deployments can be done at the cluster level. Deploying clusters regularly means your team is always comfortable with the process, and can quickly recover when disaster inevitably strikes. Moreover, in our heterogenous environment, running multiple clusters is a boon, allowing us to organize our infrastructure in a way that best fits our workloads.

Bare Metal's a Bear

Just getting Kubernetes running, being able to spin up clusters easily and automatically, is a big portion of the problem we set out to solve, but this by itself is still not 100% of what we need. Because bare metal isn't the primary target for most of Kubernetes' development, there are still some rough edges to be smoothed out. We don't get some things that come out of the box on AWS and GCE, like LoadBalancer services and persistent storage, for example.

Once a service is running in Kubernetes, there's still the matter of getting network traffic to it. Cloud providers typically have their own way of doing load balancing, and integration for these is provided in Kubernetes by way of LoadBalancer type services. You (optionally) specify an IP address, and the load balancer will forward that traffic to your service running in your cluster. The standard solution on bare metal, however, is to use NodePort type services, which open a port on the IP address of all the nodes in the cluster, and set up your own load balancer to forward traffic to that port across all nodes. In order to get the convenience of LoadBalancer services on bare metal, however, all you need is a router that speaks BGP, and MetalLB.

MetalLB is a "controller" (in Kubernetes-speak) that runs within your cluster, watching for new LoadBalancer services. It then uses BGP to synchronize those services with your router, so that traffic destined to a given IP address is forwarded to the appropriate nodes in your cluster. Once this setup is working, deploying a new service on a new IP address is a breeze, and works like magic.

Another thing that cloud providers typically give you that's not so easy on bare metal, is persistent storage. Kubernetes has all kinds of storage provider plugins, but the options for consuming disks installed in your cluster nodes are limited. One promising solution, which we have dabbled with, is Rook. Rook is an "operator" (a piece of software that drives the state of your system toward the description you specify using the Kubernetes API) used for managing Ceph on top of a Kubernetes cluster. With Ceph, pretty much all of your potential storage needs can be met, with object store, block device, and filesystem solutions. Persistent storage in the context of a cloud deployment strategy is a difficult problem to solve, however, and the youth of the Rook project has shown itself in our tests. For now, we're using off-cluster NFS servers for most of our filesystem needs, and we're keeping our PostgreSQL databases off-cluster as well.

The Future is Now

At this point, we have still only migrated a small portion of our workloads to Kubernetes, but more services are making the jump every week. We've been running some production services in Kubernetes for a few months now, and some of the benefits have already shown themselves very clearly. The Kubernetes scheduler and CLUO keep our nodes up-to-date, and rebooting for kernel upgrades is no longer a noteworthy event. The ease of provisioning new clusters with Typhoon has allowed us to experiment, upgrade, and fail over to new versions of Kubernetes with ease. Replacing our Ansible playbooks with Docker images and Kubernetes manifests has simplified a lot of our release processes. As we continue to gain experience with containers, Kubernetes, and deployment automation, I'm confident that SpiderOak will continue to be able to provide secure, private storage and collaboration solutions at ever greater velocity.

Kubernetes on Bare Metal: A Journey from Kuber-naught to Kuber-niceties
Share this
All content is licensed with: