The most unique technology in the Pantheon Platform is the Content Base — the data utilities that the 12 Factor App manifesto calls Backing Services. No website can live without them.
Pantheon’s Content Base has three primary components:
The “Valhalla” file-system, which handles binary file storage and provides a high-performance, highly available, disaster-proof network file system for all our sites.
Our database grid, which uses the same basic containerization tools as our app matrix to effectively provide a horizontally scalable SQL resource to our customers.
The orchestration tools which manage the connectivity between these required services and the application matrix where the software runs.
We also use it to manage the other important “stateful” data services necessary to run a website, like version control or a search index, and to provide Redis for key/value caching (and queues), as well as managing external services like New Relic.
Relatively speaking, our container-based Runtime Matrix is fairly well understood. We’ve been talking about it for quite a while, and the rest of the internet is trending in a similar direction. What we’re doing with the Content Base is just as vital, and is in many ways a more significant technical achievement.
Stateful = Hard
While it’s hardly child’s play to build a stateless application grid that can run tens of thousands of concurrent customer applications, the “stateless” part of that task makes it a relatively straightforward proposition. You don’t have any fundamentally unsolved problems in your way. All that changes when you start thinking about the Content part of a CMS or CMF.
We realized this when we were first architecting the Pantheon platform. Early technical advisors tried to warn us off, saying these were challenges which “led to madness” or simply “couldn’t work” — specifically providing network-attached file storage, or running a multitenant database infrastructure.
But without a database and a place to store files, how is your CMS supposed to work? You can’t just wave your hands and say the cloud will save you: unless you write an application from the ground up to use something like S3 directly — accepting some severe limitations with latency and consistency in the process — it’s not a solution. Drupal and other common Open Source website applications are built with the LAMP assumptions in mind: that they can connect to a SQL database and write to a local filesystem to store binary content.
So into the storm we went.
We didn’t want to write our own file system. We really didn’t. But we had no other choice. Every website environment instance on Pantheon is a first-class distributed system, ready to scale at a moments notice, and that means every single one of them requires network attached storage.
There are good reasons why we couldn’t use open-source file systems like GlusterFS. The OpenStack project has similar concerns:
No copy on write: cloning between volumes requires literally copying all the files around. That would make workflow operations for sites with lots of files a hellish proposition.
Lack of granular permissions: there’s no way to securely share a volume between different users.
Performance: there are still real performance bottlenecks and gotchas with GlusterFS, which isn’t optimised for the content use-case.
For Pantheon to fulfill its promise, we need to provide a consistent platform to all customers, from the free developer site up to the biggest enterprise customers. We weren’t willing to sacrifice on our vision for smooth scaling, but that would mean a unique GlusterFS cluster for each site/environment (necessitated by their security model), which would be totally unsustainable.
We also rejected a “mix and match” implementation. It would totally compromise the value of smooth scaling to use local file storage for all but the largest sites, especially if that meant a change in file system technology among dev, test, and live. We want developers to be able to rely on Pantheon 100%, so providing a robust, reliable and most of all consistent platform is a must-have.
Finally, with a multi-environment workflow, you really need copy-on-write. If you’ve got 10Gb of files, and you want to spin up a new Multidev instance on Pantheon, the filesystem takes less than a minute to mount. If you actually had to copy all those files between one GlusterFS cluster to another? Forget about it.
Our answer is the fusedav client — an ultra-modern filesystem client highly optimized for the content/media use case which can actually outperform a local filesystem for some operations — and our super-distributed “Valhalla” file server, which leverages the power of Cassandra and Amazon S3 to provide redundant and efficient storage of binary content with sub-second change propagation and synchronization.
It’s like Dropbox for your website, only about five orders of magnitude faster. Nice. Solving this problem has allowed us to scale the Runtime Matrix as fast as we can add customers. While the back-end does take some care and attention now, the architecture gives us the potential to offer effectively unlimited file storage resources
If you look at the universe of platform providers, most can provide stateless scalable application workers, but many require to you buy “a server” for your database needs. That’s something we were unwilling to accept with Pantheon. What good is the cloud if you only get one foot out of the world of servers?
Our approach to database containerization is is similar to our Runtime Matrix (using the same core kernel facilities). DB capacity is fluid, and generally scales along with an apps use of Runtime resources. We run MariaDB, and rely on our deep knowledge of the CMS use-case to tune, optimize, and manage. The key resource to protect is block I/O, and we’re constantly finding ways to squeeze a few more IOPS out of our platform.
This is only possible because we’ve turned all the common DBA tasks into software routines. We’ve automated user workflows as well as back-end operations, and have a host of self-directed agents tending to the thousands of database containers to insure they are happy and working correctly.
For instance, our database grid manages replication topology automatically, including detecting issues and self-healing breakdowns in propagation. That means we don’t need to have people worry about managing the master/slave pairs, and we can easily and quickly add horizontal read capacity to any of our sites.
Similarly, we can spin-down unused DB containers to make most efficient use of resources, e.g. in a dev environment that’s gone idle overnight. Containers restart fast enough that we can bring them back in real-time the next morning when a developer starts work.
Additional Content Base Services
In solving these two hard technical problems, we also laid down patterns which we’ve been able to leverage to provide a number of additional services. We use the same model for providing Redis and Solr Search services, as well as maintaining the connection between a site and its git repository.
The content-base pattern is also helping for integrating external services. We use it to spin up and track our monitoring tools, and it’s what we’ve used to provide one-click New Relic integration. That capability is important for the years to come.
As the use-cases for websites continue to evolve we don’t expect the core content requirements to go away, but we do expect new technologies and techniques to emerge. Having the flexibility to coherently integrate more and more backing services (whether those are provided directly by Pantheon, or by a trusted third party) will help our platform and our customers continue to flourish and grow in the years to come.
Ironically, the Content Base started as a hard requirement, but has become one of our key ways of “future proofing.” It’s a case where we made a big bet on our ability to execute on some extremely difficult technical problems, and won. That investment paid off, and the dividends continue rolling in.Topics: Education