We're often asked why we built Pantheon to run on containers, not virtual machines. I recently covered this topic in an interview with CenturyLink, where Lucas Carlson and I delved into why container-based infrastructure is more scalable. Here are a few questions and highlights from the video:
Why did you build Pantheon with Linux containers? What was it about containers that drew you away from virtual machines?
This goes right to the primary drivers we had for designing the product. We wanted a consistent experience—and it's hard to implement this using virtual machines. This is due to the fact that production may be a fleet of VMs while a developer instance may be a single VM or a set of shared VMs. Then you're going from local database access to network database access, you’re going from local file system access to something like GlusterFS. These introduce major consistency issues.
Even the tools used for local development might be different from production. Locally, you may not have access to memcached or SOLR. The way to overcome these limitations and provide a consistent experience at a reasonable price is to use containers.
You can’t deploy a fleet of VMs for every developer in every single environment along deployment at a reasonable cost. There is huge overhead associated with replicating databases servers, etc.
With containers we can still spread the application over the network, but only take a small slice of each system. You can have the application in a small container, have the database server in another small container, have SOLR in another small container and the overall footprint is small. In addition, the containers can be started and stopped on demand when they are actually accessed, so the footprint of memory and CPU is only there when it's being used.
This allows the dev and test environments to be representative of the later stages of deployment in an economical way. The architecture also pays off in production where the on-demand nature allows for scaling without the costs of over-provisioning. Since the overhead cost in resources is small, the Pantheon grid system can contain the blips in usage from their customers with little effect on the overall edge traffic that they process.
Why don’t you use Docker?
For one, our container approach pre-dates Docker by a number of years. We’ve been using containers since 2011 on Pantheon’s platform.
The Linux kernel doesn’t actually have the concept of containers. There is a set of APIs that, used in unison, gives you something that looks like a container. It’s not like Zones on Solaris or Jails on BSD, where there is a monolithic thing that you configure that creates that isolation.
On Linux there are mandatory access controls, there are capability filters, there are cgroups, there are namespaces, and probably one or two other things other than standard users and groups. If you configure enough to isolate a process you get something that looks like a container. It’s called a container on Linux because, for all practical purposes, they are the same as containers on other platforms that have the concept at the kernel level.
What Docker does is configure all these resources for you. We use systemd to do the same thing. Docker also provides a packaging and deployment model and infrastructure for publishing containers for other people to be able to install them. Unfortunately, some of the design goals for Docker have not aligned with some of our needs. We already have technology for configuring what is in the container, we use Chef to set the context of the container itself rather than the base system. Also, our density needs are considerably more advanced than what Docker provides right now.
Our technology that allows containers to be activated on usage is done by a technology called “socket activation”. systemd sits on the base system, it opens a socket (the “listener”), it puts an “epoll listener on there that allows it to identify that a connection is coming in but not actually process it. Knowing that a request is coming in systemd can activate a service or container on the system.
This also allows containers to be hibernated, by identifying processes that have not been accessed or processed a request in a while and then being “reaped”. This allows these processes to not consume any resources; when a new request comes in it takes a few seconds to restart the container. We deploy thousands of containers to any particular box, but only about 5-10% are running any any particular point.
Is any of this open source?
Everything we do for socket activation is entirely open source and built into the systemd project. This is shipping extremely broadly now—back when we started on systemd it was only on Fedora. Now it's on SuSE, it’ll be on the next Ubuntu release, it’s on the latest Red Hat Enterprise 7. So access to this architecture is now available to everyone.
To learn more, read the rest of the transcript from the video, check out the original post on CenturyLink Labs and read CEO Zack Rosen's post, "Why We Built Pantheon With Containers Instead of Virtual Machines".Topics: Education