WordPress Multisite - Option 1 - Single WordPress Installation
Option 1 - Single WordPress Install at Large Scale
Single WordPress Installation
- Single brand
- Consistent design across site
- One team of up to 50-60 people
WordPress Multisite Network
- Multiple departments / organizations
- Different design in different parts of site
- Large common denominator between designs
Multiple Independent WordPress Installations
- Multiple brands / different sites
- Different goals / design / functionality / plugins
- Different teams each developing functionality and adding code
Multisite or Independent
- Any of the above scenarios
- Built in automation, performance, scalability and reliability
- Managed service
If you only need one WordPress site, or even if you plan to do Multisite (Option #2) or several WP sites managed by automation (Option #3), you will need to make sure that each of your WP instances is built for large scale. This section discusses basic considerations and implementation strategies for WordPress at large scale.
Technical Requirements for Large Scale WordPress
A large-scale WordPress instance requires a horizontally-scalable architecture; reverse-proxy page caching and a persistent object cache to offload queries as much as possible from the database; database replication; an optimized search index; and automated dev/test/deployment workflows. In the following sections we explain a bit more about each of these requirements and how it is typically implemented in WordPress.
Elastic Architecture - Horizontal Scalability
An elastic architecture means that when traffic increases, you can provision more machines to run your website. When traffic calms down, you can save resources and turn off your extra capacity. This also provides high availability.
- Load Balancing across the available PHP App servers. Can be done with software tools (e.g. haproxy), dedicated hardware or a cloud service (e.g. Amazon ELB).
- Shared Media: You must find a way for uploads to be available to all PHP App servers. Can be done with open-source tools like GlusterFS, NFS, and Ceph or Amazon EFS on the cloud.
- Consistency: Servers and DBs need to have the same configuration, and changes to WP itself must be deployed consistently, to avoid complex issues as you scale. Requires an automated workflow using tools like Puppet, Chef, Ansible.
Page Caching - the Key to Serving Internet-Scale Traffic
Caching is a key part of a WordPress performance and scalability strategy. At large scale, the solution cannot be driven by the app itself - as shown in these charts from Joe Hoyle comparing Batcache (driven by WordPress) to Varnish (which runs independently):
A well-established pattern for large scale WordPress is a reverse-proxy - requests and responses to/from WordPress flow through an intermediary service, which provides a cached copy of a response for a specified time. Reverse-proxy can be literally up to 1000x more efficient than a PHP web server at delivering cached responses.
Popular reverse proxy tools are Varnish and NGINX’s built in caching. Another option is CDNs like Akamai, Fastly or CloudFlare.
- Cache TTL and Expiration: You will have to devise a mechanism for clearing the cache without restarting the proxy.
- Cookies: Most Proxies rely heavily on cookies to decide whether to provide a cached response or pass to WordPress. Both proxy and WP need to be properly configured to avoid too many requests going to WP.
- Learning curve: For many developers, it is more complex to maintain and debug a system in which the browser does not talk “directly” to WordPress. Take the learning curve into account.
Object Caching - Speed Up Dynamic Pageviews
WordPress offers an internal object cache — a way of automatically storing data from the database (not just objects) in PHP memory to prevent unnecessary queries. Out of the box this is inefficient, but WordPress easily integrates with persistent/external storage backends like Redis or Memcached (or on the cloud, AWS ElastiCache or Azure Managed Cache). These backends persist objects between requests, speeding up execution while reducing DB load.
- Complexity: This adds a layer to the stack, and you need to make sure you have the same Object Caching solution present for development and testing environments.
- Invalidation: WordPress data might be updated frequently, but you need to intelligently decide when to purge your cache data. Frequent purging on every change will reduce performance.
- Eviction: Cache backends use an LRU (last resource used) strategy for “evicting” items from the cache when more room is needed. You need to account for unexpected cache expirations that might occur as a result of the cache running out of space.
- Optimization: You’ll need to smartly cache data based on an equation of how expensive it is to generate, how frequently it’s requested (aka likelihood of actually being served), and how much capacity you have in your persistent storage backend.
Query Performance - The Database is the Bottleneck
Your website’s database is the ultimate bottleneck when scaling. An elastic architecture allows you increase your capacity for read queries through replicas. The HyperDB plugin lets you scale out the DB while managing the use of master and replicas - for example, administrative functionality happens only on master, to provide guaranteed access and performance to site editors.
Avoiding "Queries of Death"
Scaling via database replication still assumes that your queries are generally performant. If your use-case means you have a content footprint (100s of thousands or millions of posts) the WordPress default query builder (aka WP_Query) may generate “queries of death”: requests to the database that can take several seconds to compute.
These are called “queries of death” for a reason. They can suddenly and drastically affect site performance, even to the point of causing downtime. Long-running queries are intensive, often involving the creation of a whole temporary table to compute the result. They bog down database performance for all queries and tie up PHP application capacity, a lose-lose combination.
Slow queries block the PHP Application threads that kick them off. If they’re happening in high volume they can overwhelm even a horizontally scalable “elastic” infrastructure. Eventually all your PHP threads are waiting for slow queries to respond, at which point the site is effectively offline.
Even with best-practice architecture, an important part of scalability hygiene is reviewing query performance. MySQL has a built in capability called the slow query log, which will allows you to build and analyze data on your query times. You may also find value here in using application performance monitoring tools such as New Relic.
- Avoiding “queries of death”: default WordPress queries (e.g. those generated by WP_Query) can generate DB requests that take seconds to compute - if this happens often, it can bring even an elastic architecture to its knees.
- Query routing: Keep it simple for starters, in most cases it’s best not to customize HyperDB and have less moving parts.
- Replication lag: Replication should be instantaneous, but sometimes it isn’t, so ensure that you can handle multi-second lag, and monitor for unacceptable levels.
- Debuggability: You will need a slow query log, and a way to reliably replicate the situation that caused a slow query. This requires a debugging environment with all the data needed to trigger the slow query.
- Regressions: For sites with large datasets, it is important to examine new queries and test them before using in production.
Scaling Site Search with an Index
WP’s built-in content search runs slowly if you have a large number of posts, and cannot produce results based on relevance. A dedicated search index improves performance significantly, and allows a richer user experience. It does this by circumventing the WP database for search-related queries, and sending these queries to a high-performance subsystem. This allows more and more complex queries to be performed, allowing better search functionality (autocomplete, searching by meta data).
Common options are ElasticSearch, Apache Solr search index, and AWS’s CloudSearch.
- Overriding WP_Query: Most implementations of a search index backed involve overriding the built-in WP_Query() object. Doing this for specific queries will require care and attention from a developer.
- Index Rebuilds: While not common, you may come across a situation that requires you to rebuild your content index. This means being able to “fall back” to the database, at least temporarily.
- Complexity: As with other dedicated subsystems, a search index is yet another piece of infrastructure to set up, monitor, and manage. While the payoffs are clearly worth it, this does become another ongoing responsibility to maintain.
A Real-World Scalable Architecture
Putting all these components together, the stack will include:
- Apache or Nginx
- Memcached or Redis
- ElasticSearch or Apache Solr
Don’t panic! This is why the discipline of DevOps exists. There are many professionals who are experienced with setting up this style of implementation, and many of the component pieces are now available as cloud services.
Also, you don’t have to run it yourself. This style of architecture is also available from many managed hosting and platform providers.
Important Considerations When Outsourcing Your Infrastructure
If you are looking to outsource your website infrastructure — and increasingly common choice — you should now be armed with sufficient knowledge to evaluate various providers:
- Do they provide load-balancing and reverse-proxy caching?
- Is their infrastructure truly elastic? What is the turnaround time for scaling horizontally?
- How do they handle the need for a network filesystem for uploads?
- Can they provide MySQL replication? Does it support HyperDB?
- What options do they offer for a Search Index?
Tips from Tim Wild of PTS - scaling WP on Amazon AWS:
- Wordpress works great on Amazon Web Services. AWS provides scalable storage, compute and database, load balancing, a content distribution network, monitoring and backups, all the components needed to have Wordpress work reliably at scale. Setup can be simplified using prebuilt AMIs, auto install systems such as Easy Engine, or other automation products / platforms.
- Automation is key when working in scalable environments. While the AWS cost per hour is relatively low it does add up over time, especially when multiple services are used. Alerts can help you keep on top of your AWS costs. Under AWS it can sometimes be cheaper and easier to scale vertically before adding the complexity and cost of scaling out horizontally, depending on factors such as the variability of your website load and your reliability requirements. For example, a load balancer costs as much as a t2.small instance, which in turn costs significantly less than a higher powered m4.xlarge instance. Using a load balancer gives you the ability to change compute capacity with no downtime, makes maintenance easier, and increases reliability.
- A caching layer such as Redis or Nginx page cache will hugely accelerate delivery of pages where users aren't logged in, which for many websites is the majority of requests. Generation of pages takes significant compute resources within WordPress, so any page caching can make a significant difference. Micro-caching can be used for rapidly changing websites - caching periods of even a few seconds can reduce the server load for busy websites. Keeping your database on RDS, your static resources on S3, with the CloudFront CDN, means your WordPress servers are always in sync serving the same content.
- For high scale WordPress websites an elastic load balancer, auto scaling, relational database service, S3, and the CloudFront CDN are the key AWS services to use. When deployed into a well architected, well designed system they will help you create a reliable, high performance website.
David Alexander of MazePress - value of basic caching and optimization tools:
- WordPress is a fairly server-intensive piece of software and when compiled with 10+ plugins to create other site-wide functionality, this adds more HTTP requests, more CSS files, scripts and so on. As a result, using caching tools like W3 Total cache and optimization tools like Autoptimize allows you to use minification, compression and various forms of caching to reduce this burden and speed up your website.
- I haven't used Varnish and Redis, though am aware of them, but my database expert / full-stack developer has always said that our servers do enough and these tools don't add a lot more performance.
- Another point is to note the actual server hardware being used will dictate what kind of acceleration tools are worth using.
Tips from Brian Jackson of KeyCDN - CDN and Origin Shield caching:
- A Content Delivery Network (CDN) helps speed up the delivery of WordPress assets on a single installation by caching them on edge servers around the globe closer in proximity to the user. When dealing with WordPress on a large scale, you can take this one step further and take advantage of a feature CDNs call "Origin Shield".
- Origin Shield is an extra caching layer that sits between your origin server and the edge servers. Requests are typically delivered from cached content on the edge servers to the clients. When you have Origin Shield enabled, if a new request to an edge server doesn't have your cached content, it requests it from the cache on the shield server instead, eliminating an additional request to your origin server. Additional requests do not touch your origin server, until the cached content on the shield servers expires or you purge the cache on your zone.
- Origin Shield servers make use of collapsed forwarding to merge multiple requests for the same URL into a single request to your origin server. Keep-alives also avoid excessive TCP handshakes to your origin server. When working with WordPress on a bigger scale, especially large single installations, this extra layer of caching can be essential to reducing the load and traffic on your origin server as well as protecting your infrastructure from abuse or traffic spikes.
Want to Get it Out of the Box?
Want an enterprise-grade WP instance with scalability, performance and high availability, without working hard?
Want to get a highly scalable, resilient WordPress instance without working hard?
Secret Option - How to do it on Pantheon.