Avoiding Double-Escaped Output in Drupal 8

Greg Anderson , Open Source Contributor Reading estimate: 6 minutes

It has long been understood that failing to escape user-generated content in a web application can lead to extremely serious security vulnerabilities. Unfortunately, even though the techniques for preventing these problems are widely known, it is still common for web developers to occasionally fail to fully employ the necessary precautions. These omissions can be extremely difficult to notice by casual inspection.

 

How does this work? Imagine that you have some Drupal code, and you would like to display some lovely markup text, like this:

 

$markup_text = “<b>markup</b>”; 

 

If you just pass this text straight through without any processing, the browser will interpret the markup and display the result. The table below shows how this looks internally, and what the result is in the browser:

 

Internal Representation

Rendered Result

<b>markup</b>

markup

If, on the other hand, you have some data that was taken from user input, then you need to filter the result before displaying it.

$safe_text = check_plain(“Joe’s page”); 

 

 

Internal Representation

Rendered Result

Joe&#039;s page

Joe’s page

 

Don’t do it twice, though:

 

$double_escaped_text = check_plain($safe_text); 

 

Internal Representation

Rendered Result

Joe&amp;#039;s page

Joe&#039;s page

Although the examples above are fairly obvious, once you start working with a large body of code with many APIs that have different rules about whether their input values should be raw text or escaped text, it becomes much more likely that someone might use some unsafe text in the wrong place.

Because of this, in Drupal 8, strong steps are being taken to make it easier for web developers to produce output that is safe by default. The basic premise is that Drupal  is now going to be much more aggressive about automatically escaping generated output. This policy is very beneficial from the standpoint of security, but it still requires that the developer take extra care when handling output, as it is now much more likely that coding mistakes will lead to ugly, malformed output. This is far preferable to producing vulnerable, unescaped output, but the result is all the same highly undesirable, and care must be taken to avoid it.

 

The good news is that in Drupal 8, if you stick to best practices and always use the APIs when producing markup, then your output will be correct every time. Most of the familiar functions from Drupal 7, such as t(), and drupal_render(), are available, and work very similarly to the way that they always have. For example, to build an HTML unordered list with two elements, you could use an item_list with drupal_render, as shown below:

 

     $raw_list = array($raw_user_data_1, $raw_user_data_2);

     $list_render_array = array(

       '#theme' => 'item_list',

       '#items' => $raw_list,

     );

     $safe_list_html = drupal_render($list_render_array);

 

  • User data 1
  • User data 2

As you can see, the Drupal API for building and using render arrays is really quite simple and easy to use. The result is clean, safe markup that you can return from a routing controller function (just wrap it in a ‘#markup’ item), or use in another context where safe markup is needed. For example, if you needed to include the list above in a message displayed to the user, you can pass it as a parameter to the translation function, t(), and use the result of that to display the message via drupal_set_message() — just as in Drupal 7.

     $raw_title = “Joe’s \”special\” page”;

     drupal_set_message($this->t(‘Let me tell you a few things about @title: !list’, 

           array(‘@title' => $raw_title, ‘!list’ => $safe_list_html)), 'warning');  

 

The t() function in Drupal still provides replacement patterns that start with either “@“ or “%” for raw variables, that need to be escaped, or “!” for safe content that has already been escaped.

 

The times that you need to be especially careful are those instances where you attempt to bypass the default mechanism, and attempt to inject your own unescaped content into the output stream. The harder you fight against the template engine, the more likely it becomes that you might manage to put an invisible XSS vulnerability into your site. Fortunately, in Drupal 8, double-escaped output it the more common result, but the moral is the same: work with the APIs, not against them. Let’s examine one common thing that can go wrong in the example below.

 

     $raw_title = “Joe’s \”special\” page”;

     $trust_me_this_is_safe = ‘abcd-1234’;

     drupal_set_message($this->t(‘The identifier for @title is !id’, 

           array(‘@title' => $raw_title, ‘!id’ => $trust_me_this_is_safe)), 'warning');   

 

What just happened here? In the previous example, ‘@title’ was correctly rendered, but now it is double-escaped—even though nothing changed with respect to the handling of that element! The thing that is different in this example is that the contents of the second replacement placeholder !id was never passed through any Drupal text filtering or rendering function. In the case of !list, we provided data that was produced by a Drupal API function, drupal_render(). When we used !id in the t() method, above, however, we ignored the Drupal API functions completely; we just sort of knew that our variable contents were safe, so we used !id under the assumption that nothing further was necessary. However, whatever inner assurances made us think that it was okay to skip output escaping failed to convince Drupal that these variable contents were free from any potentially problematic characters. In Drupal 8, any API that filters or escapes text explicitly marks the resulting string as being safe to output. In the case of the t() function, whether or not its output will be marked safe depends on the types of substitution variables used, and the source variables provided to replace them with. If all of the replacement placeholders are escaped (beginning with ‘@‘ or ‘%’), and if all of the unescaped placeholders (beginning with ‘!’) are used with variables that contain text that has already been marked safe, then the resulting text produced will also be marked safe. Because of this, when the contents of this string passed to drupal_set_message is finally rendered in the page content, Drupal will see that it has not been marked as safe, and will escape the entire string. This is why the title comes out double-escaped—once by the explicit escaping of ‘@title’ in the t() function, and one more time at the end, when the template engine catches the error and escapes the whole string again.

 

The best way to fix this particular situation, where the replacement value is believed to be free of any characters that need escaping, is to simply escape the value anyway, by using ‘@id’ instead of ‘!id’ in the t() method. We believe that this string is free of any character combination that may cause problems, but using @id ensures that this is the case, so the t() function will mark the resulting string safe. This produces the correct output again, as we can see in the diagram below.

 

     $raw_title = “Joe’s \”special\” page”;

     $raw_id = ‘abcd-1234’;

     drupal_set_message($this->t(‘The identifier for @title is @id’, 

           array(‘@title' => $raw_title, ‘@id’ => $raw_id)), 'notice');  

 

It is helpful in Drupal 8 to stop thinking of !id as meaning “unescaped”, and to start thinking of it as “already safe”. In other words, if you use unsafe content in a context where it is supposed to be “already safe”, then Drupal will notice this mistake, and you will likely get the wrong result. The shift in thinking is to remove all of the places in the code where you are asking the API to “trust you” (e.g. with ‘!id’), and instead use the appropriate APIs to filter your output. Always pass strings through t(), use the @ and % markers for any content not already filtered by an API function, and only insert HTML markup into your output via the drupal_render() function. Furthermore, It is also best to avoid placing HTML markup directly into your render arrays using string literals, and instead, insure that all markup is contained in a Twig template. Most of the ordinary constructs you will need are already available in Drupal’s standard theme functions. A long list of the available theme templates can be found at the bottom of the documentation page Theme system overview. If you need to insert some custom markup into your output, you can create your own Twig template. For a run-down on how this is done, see the article Generating Safe Markup in Drupal 8, by Jonathan Patrick.

Finally, it is also important to realize that using high-quality test data is very important to ensuring the correctness of your code. Double-escaped output cannot be detected when the input string contains no special characters. If we had used $raw_title = “Ordinary stuff”; in the example above, then the incorrect code would have produced output that was indistinguishable from the output produced by the correct code. Always use test values such as “A & B”, “My <b>inappropriately bold</b> example”, and similar strings, so that double-escaped output will be immediately apparent as soon as it happens. This practice will go a long way towards ensuring that subtle errors do not creep into your code.

Topics

Share

Discover More

How to Build Agile Web Development Practices For City Government

Steve Persch
Reading estimate: 5 minutes

Drupal for Civic Engagement: the City of Chattanooga Story

Yulia Popova
Reading estimate: 3 minutes

How Drupal Can Deliver Scalability and Flexibility for the Public Sector

Josh Koenig
Reading estimate: 4 minutes

Try Pantheon for Free

Join thousands of developers, marketers, and agencies creating magical digital experiences with Pantheon.