Websites are comprised of many moving parts, and performance can be adversely impacted by issues within the application, the infrastructure, or by internet-wide incidents or outages. There can also be "good-to-have" performance problems—when websites see valid traffic spikes, for example. Additionally, there are times when debugging intermittent application-level issues that pinpointing a root cause requires seeing the issue as it is occurring. It’s good to be aware of a critical site’s state, and several services allow stakeholders to be connected and notified in real-time of site-impacting events, allowing for fast and appropriate response to issues.
Pantheon already performs thousands of checks per hour on multiple layers of the platform. Additional external monitoring not only provides alerting when a site is non-responsive, it serves a valuable forensic purpose by correlating performance—represented by page load or more complex transactional response time—to live site deploys on Pantheon. Monitoring with additional tools, such as New Relic Pro, is a simple way to have a historical representation of site performance.
New Relic® Performance Monitoring provides a lot more than monitoring. It provides deep insight into an application’s performance. This is especially helpful for Drupal and WordPress sites, both of which interact heavily between code and database, and utilize multiple layers of caching. New Relic also helps identify performance issues at the PHP function level, as well as show slow MySQL database queries which are candidates for optimization.
All sites on Pantheon include the upper-tier Professional version of New Relic. Based on user preference, application performance and uptime can be easily configured to trigger email notifications. As performance issues often have a tendency to have a snowball effect, these warning notifications can often be responded to before downtime occurs.
[BLOG] Getting Started with New Relic APM Pro
[DOC] How to Enable and Use New Relic APM Pro on Pantheon
[DOC] MySQL Troubleshooting with New Relic Pro
[DOC] Automatically Label Code Changes in New Relic using Quicksilver Hooks
Pingdom provides uptime monitoring as a service for applications. A site monitored by Pingdom is checked periodically based on user configuration. This check can simply verify the website sends back the expected response or can be configured to execute a test transaction. This ensures that not only is the home page responding, but also that e-commerce and other critical functionality is performing as usual. Pingdom captures header information and response time, which is helpful when diagnosing errors or intermittent problems. It provides a mobile app, automated monthly reports, as well as webhooks to interact with chat applications such as Slack and HipChat. Alerts can be prioritized based on urgency, and sent to single team members or larger groups.
Pingdom can be set up on Pantheon sites to alert not just during downtime, but when a site is encountering heavy traffic or increased workload. This can help isolate non-infrastructure related downtime quickly, and respond accordingly: by optimizing performance or scaling up. Using Pingdom’s transaction monitoring, if a test shopping cart purchase takes too long to complete, notification rules can be created to send alerts. Thus, the development team can see issues as they are occurring, rather than review logs after the fact. Pingdom also offers Real User Monitoring, a service that works with Drupal and WordPress to provide user location data as well as application performance statistics.
PagerDuty can also be used in conjunction with New Relic alerting, as an additional monitoring layer for mission critical applications, since it provides regional monitoring that can help identify internet-wide issues, such as DNS resolution problems.
[DOC] Pingdom Uptime Check
PagerDuty provides alerting and escalation services for a wide range of monitoring tools, as well as directly through an API. This allows for multiple layers of alerting, across multiple timezones, thus allowing the creation of on-call schedules that get escalated until an alert is acknowledged. Each PagerDuty user can customize their alert policy to use a combination of email, SMS, or phone to build a personalized notification process.
PagerDuty, used in conjunction with NewRelic and other applications, allows Drupal and WordPress site owners, support teams and other stakeholders to create custom notification policies which ensure that there are multiple levels of fallback in the event of an incident. This allows for faster response to incidents, as well as automatic escalation, in the event an on-call team member is unavailable. Agencies or organizations which use helpdesks such as Zendesk can use PagerDuty as well to be notified of end user issues.