Pantheon's Site Disaster Recovery (DR) service is designed for mission-critical websites that need to ensure business continuity during the unlikely event of a zone failure. While Pantheon can’t prevent disasters from happening, we have architected a multi-zone, high availability (HA) solution with intelligent failover at the Global CDN layer that can help minimize the effects of an incident.
Disaster Recovery is more than just a combination of technology. It includes 24/7 support and a guaranteed 99.99% uptime Service Level Agreement (SLA) to keep sites from going down, and to actively respond to any incident that might occur.
With Disaster Recovery in place, Pantheon monitors datacenter health and proactively manages failover to avoid or minimize data loss.
Recovery Point Objective (RPO) is a baseline of reasonably acceptable data loss in the event of a major outage. In short, think “lost transactions.” The RPO is largely determined by whether there is replication lag or not, which can result from heavy transactional load on the site’s database. We rarely see replication lag, and when we do, we have automated tools to rebuild the site’s replica and alert our team if a site is consistently experiencing replication lag. This way, we can proactively notify you so you can work toward a solution. For sites with DR enabled, Pantheon’s RPO is 5 minutes.
Recovery Time Objective (RTO) is the target amount of time within which a business process must be restored after a disaster in order to avoid unacceptable consequences from a break in business continuity. In short, think “time down.” For sites with DR enabled, Pantheon’s RTO is 15 minutes.
Site Disaster Recovery is available for purchase as an add-on to all Elite site plans except Elite Starter. For more information, please contact Sales.
The Pantheon Platform has redundancy built-in throughout our containerized infrastructure. In addition, sites with Disaster Recovery get a replicated database in an alternative availability zone for even higher availability.
Pantheon is constantly running infrastructure checks to ensure all systems are running smoothly. If we ever detect elevated error rates in any of our zones, we failover to a backup zone.
In the event of a zone failure we reroute your site to the backup zone and provision a new replica in a healthy zone.
Pantheon’s disaster response team will proactively notify you of a zone failure event and its impact to your site. Your dedicated team will keep clear and transparent lines of communication on any action taken to keep your site online, and is available 24/7 for all of your questions.
Redis cache is not preserved after a site failover. This means you must ensure that the site can handle having its cache dropped under regular visitor traffic in order for the site to operate as expected in a failover scenario.
To test in a Test Environment, click the Clear Caches button in the upper right hand corner of the Site Dashboard.
You can also connect to your Redis instance and run the
flushall command to clear Redis cache via the Redis CLI.
If you rely on the Redis cache for locks (mutexes) or storing other long-term data, you must move them out of Redis and into the database to avoid any issues when the Redis cache is dropped during failover.
While Solr can be enabled on a site with Disaster Recovery, Pantheon's Solr service is not highly available, nor does it include failover for Solr. The RTO and RPO do not apply to Pantheon's Solr Service.
If your site requires Solr, do not use Pantheon's Solr service. If you require a highly available Solr service, please consider an alternative Solr service.
Pantheon Solr requires additional considerations when used on sites with Disaster Recovery. In its default state, if a site with Pantheon Solr and Disaster Recovery fails over to the backup, Solr will not automatically rebuild the search index. In that case,
You may also consider one of the two following options: