Bots and Indexing on Pantheon

Information on managing bots and indexing while avoiding performance degradation on your Pantheon WordPress or Drupal site.

Discuss in our Forum Discuss in Slack

Bots are part of every public-facing website's lifecycle. We wouldn't be able to find a thing on the internet without them! Bots perform the hard work taken for granted when browsing the multitudes of indexed search results from any given search engine. In the wrong hands, bots can become nagging nuisances slowing down or even taking down your site.

Bots in My Logs: Real World Scenarios and Identifiers

Bots don't browse like humans. Analyzing access patterns in the nginx log is one of the quickest ways to determine the presence of bots.

Rapid Fire Requests/Duplicates

In the log snippet below, there are multiple requests coming in for the same path in rapid fire succession. The time stamp reflects 5 identical requests at the same millisecond. You should investigate these requests. - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 0.848 ",, ::ffff:,::ffff:"
unix: - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 1.059 ",, ::ffff:,::ffff:" - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 1.059 ",, ::ffff:,::ffff:"
unix: - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 1.271 ",, ::ffff:,::ffff:" - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 1.271 ",, ::ffff:,::ffff:"
unix:\xC8\xFB\x7F - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 1.481 ",, ::ffff:,::ffff:" - - [11/Nov/2013:19:05:24 +0000] "POST /index.php?q=comment/reply/545 HTTP/1.0" 500 588 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.3 (build 51720))" 1.482 ",, ::ffff:,::ffff:"

Bots Converging on Erroring Pages

Some legitimate bots/crawlers/proxies (such as BingBot or AdsBotGoogle) will identify themselves. Since search-indexing is desirable for most sites, tread carefully in order to avoid wreaking havoc on a site's SEO. That said, there may be instances in which crawlers/spiders converge on a page that is erroring out ( 502s in the example below). These repetitive requests can increase the pageload issues by putting more load on the server. Investigate these errors immediately. When the error has been fixed, the bots/crawlers will no longer be hung-up on the give path. - - [26/Jul/2013:15:27:38 +0000] "GET /index.php?q=shop/kits/shebang-kit HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +" 14.188 ",,,"
unix: - - [26/Jul/2013:15:27:38 +0000] "GET /index.php?q=shop/kits/shebang-kit HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +" 14.476 ",,," - - [26/Jul/2013:15:27:38 +0000] "GET /index.php?q=shop/kits/shebang-kit HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +" 14.477 ",,," - - [26/Jul/2013:15:26:37 +0000] "GET /index.php?q=gush/content/name-pimp-november-2008&page=17 HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +" 14.722 ",,," - - [26/Jul/2013:15:26:37 +0000] "GET /gush/content/name-pimp-november-2008?page=17 HTTP/1.1" 502 166 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +" 14.999 ",,,"
unix: - - [26/Jul/2013:15:26:37 +0000] "GET /index.php?q=gush/content/name-pimp-november-2008&page=17 HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +" 14.998 ",,," - - [26/Jul/2013:15:31:03 +0000] "GET / HTTP/1.1" 500 109 "-" "" 0.126 ",,,"

Indexing Your Pantheon Site

It is important to note that each of your site environments have a robots.txt file associated with the platform domain (e.g., or custom Vanity domain (e.g., that contains the following:

# Pantheon's documentation on robots.txt:
User-agent: *
Disallow: /

User-agent: dotbot
User-agent: PetalBot
User-agent: PowerMapper
User-agent: RavenCrawler
User-agent: rogerbot
User-agent: SemrushBot
User-agent: SemrushBot-SA
User-agent: Swiftbot
Allow: /

Additionally, Pantheon's edge layer adds the X-Robots-Tag: noindex HTTP header when serving requests from platform domains (e.g. This instructs most bots/crawlers not to index the page and prevents it from being returned in search results.

Indexing Before You Launch

The domains are intended for development use and cannot be used for production. While Drupal and WordPress both generate their own robots.txt file by default, a custom or CMS-standard robots.txt will only work on Live environments with a custom domain. Adding sub-domains (i.e., for DEV or TEST will remove the X-Robots-Tag: noindex header only, but still serve the Pantheon robots.txt from the platform domain.

To support pre-launch SEO and site search testing, we allow the following bots access to platform domains:

Some tools (like Siteimprove or ScreamingFrog) can be set to ignore robots.txt when scanning. If you're testing links or SEO with other tools, you may request the addition of the tool to our robots.txt file by contacting support to create a feature request. Otherwise, you can connect a custom domain (like to the Live environment and test your links following the alternative domain.

If you run SEO toolsets locally, you can utilize an /etc/hosts file entry on your local development box to spoof your production domain on Pantheon:

Note that modifying the hosts file usually requires administrative privileges from the OS.

The location of the hosts file varies depending on your operating system:

  • MacOS / Linux: /etc/hosts
  • Windows: C:\\Windows\System32\Drivers\etc\hosts

Add lines to your operating system's hosts file in the following format:


In the example above, replace the IP addresses with those provided by Pantheon, and the domains with your own.

You can index your site under your production domain once it's added to the Live environment. There are many contrib module options available for creating sitemaps for Drupal, including XMLSiteMap and Site_Map. WordPress users can install the Google XML Sitemaps or Yoast SEO plugins, which will maintain sitemap updates automatically. It is up to you to configure the extensions to work as you desire. Pantheon does not offer support for Drupal modules or WordPress plugins.


Sitemaps Produce a White Screen of Death (WSOD)

Some modules or plugins are configured by default to fetch all URLs at once during sitemap generation which can result in a blank white page (WSOD) due to exceeding PHP's memory limit. To resolve this issue, adjust the plugin or module configuration so that URLs are fetched individually instead of all at once.

For example, if you have a Drupal site using the XMLSiteMap module, navigate to admin/config/search/xmlsitemap/settings and uncheck Prefetch URL aliases during sitemap generation. Save the configuration and clear caches for the Live environment on the Pantheon Dashboard or via Terminus:

terminus env:clear-cache

Props to Will Hall for highlighting this solution in a related blog post.

Legacy Sitemap Submissions Generating 404s

Sitemaps can (and should) be submitted directly to Google Webmaster Tools. However, if there are legacy submissions out there generating 404s, you'll need to redirect via PHP within wp-config.php or settings.php. For example, WordPress sites running the Yoast SEO plugin can use the following:

// 301 Redirect from /sitemap.xml to /sitemap_index.xml
if (($_SERVER['REQUEST_URI'] == '/sitemap.xml') &&
  (php_sapi_name() != "cli")) {
  header('HTTP/1.0 301 Moved Permanently');
  header('Location: /sitemap_index.xml');

For more examples of redirecting via PHP, see Configure Redirects.

Incorrect robots.txt Output in WordPress

In WordPress, do not enable Discourage search engines from indexing this site on Dev or Test environments. This option is set in Settings > Reading > Search Engine Visibility in the WordPress Admin Dashboard.

This setting creates a built-in robots.txt file that disallows or blocks crawlers. While the file applied by the platform normally overrides it, it doesn't when there's a trailing slash on the URL pointing to robots.txt.

As a workaround, you can override the output by creating your custom filter for robots_txt. You can add this as a custom plugin, or an entry in your theme's functions.php file:

add_filter('robots_txt', 'custom_robots_txt', 10,  2);

function custom_robots_txt($output, $public) {

    $robots_txt =  "User-agent: * \n";
    $robots_txt .=  "Sitemap: \n";
    $robots_txt .=  "Disallow: /secure/ ";
    // add more $robots_txt .= for each line

    return $robots_txt;