Class Encode

java.lang.Object
org.owasp.encoder.Encode

public final class Encode extends Object
Encode -- fluent interface for contextual encoding. Example usage in a JSP:
     <input value="<%=Encode.forHtml(value)%>" />
 

There are two versions of each contextual encoding method. The first takes a String argument and returns the encoded version as a String. The second version writes the encoded version directly to a Writer.

Please make sure to read and understand the context that the method encodes for. Encoding for the incorrect context will likely lead to exposing a cross-site scripting vulnerability.

  • Method Details

    • forHtml

      public static String forHtml(String input)

      Encodes for (X)HTML text content and text attributes. Since this method encodes for both contexts, it may be slightly less efficient to use this method over the methods targeted towards the specific contexts (forHtmlAttribute(String) and forHtmlContent(String). In general this method should be preferred unless you are really concerned with saving a few bytes or are writing a framework that utilizes this package.

      Example JSP Usage
           <div><%=Encode.forHtml(unsafeData)%></div>
      
           <input value="<%=Encode.forHtml(unsafeData)%>" />
       
      Encoding Table
      Input Result
      & &amp;
      < &lt;
      > &gt;
      " &#34;
      ' &#39;

      Additional Notes

      • The encoding of the greater-than sign (>) is not strictly required, but is included for maximum compatibility.
      • Numeric encoding is used for double-quote character ( ") as it shorter than the also valid &quot;.
      • Carriage return (U+0D), line-feed (U+0A), horizontal tab (U+09) and space (U+20) are valid in quoted attributes and in block in an unescaped form.
      • Surrogate pairs are passed through only if valid.
      • Characters that are not valid according to the XML specification are replaced by a space character as they could lead to parsing errors. In particular only #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] are considered valid.
      Parameters:
      input - the data to encode
      Returns:
      the data encoded for html.
    • forHtml

      public static void forHtml(Writer out, String input) throws IOException
      See forHtml(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forHtmlContent

      public static String forHtmlContent(String input)

      This method encodes for HTML text content. It does not escape quotation characters and is thus unsafe for use with HTML attributes. Use either forHtml or forHtmlAttribute for those methods.

      Example JSP Usage
           <div><%=Encode.forHtmlContent(unsafeData)%></div>
       
      Encoding Table
      Input Result
      & &amp;
      < &lt;
      > &gt;

      Additional Notes

      • Single-quote character (') and double-quote character (") do not require encoding in HTML blocks, unlike other HTML contexts.
      • The encoding of the greater-than sign (>) is not strictly required, but is included for maximum compatibility.
      • Carriage return (U+0D), line-feed (U+0A), horizontal tab (U+09) and space (U+20) are valid in quoted attributes and in block in an unescaped form.
      • Surrogate pairs are passed through only if valid.
      • Characters that are not valid according to the XML specification are replaced by a space character as they could lead to parsing errors. In particular only #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] are considered valid.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forHtmlContent

      public static void forHtmlContent(Writer out, String input) throws IOException
      See forHtmlContent(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forHtmlAttribute

      public static String forHtmlAttribute(String input)

      This method encodes for HTML text attributes.

      Example JSP Usage
           <div><%=Encode.forHtmlAttribute(unsafeData)%></div>
       
      Encoding Table
      Input Result
      & &amp;
      < &lt;
      " &#34;
      ' &#39;

      Additional Notes

      • Both the single-quote character (') and the double-quote character (") are encoded so this is safe for HTML attributes with either enclosing character.
      • The encoding of the greater-than sign (>) is not required for attributes.
      • Numeric encoding is used for double-quote character ( ") as it shorter than the also valid &quot;.
      • Carriage return (U+0D), line-feed (U+0A), horizontal tab (U+09) and space (U+20) are valid in quoted attributes and in block in an unescaped form.
      • Surrogate pairs are passed through only if valid.
      • Characters that are not valid according to the XML specification are replaced by a space character as they could lead to parsing errors. In particular only #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] are considered valid.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forHtmlAttribute

      public static void forHtmlAttribute(Writer out, String input) throws IOException
      See forHtmlAttribute(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forHtmlUnquotedAttribute

      public static String forHtmlUnquotedAttribute(String input)

      Encodes for unquoted HTML attribute values. forHtml(String) or forHtmlAttribute(String) should usually be preferred over this method as quoted attributes are XHTML compliant.

      When using this method, the caller is not required to provide quotes around the attribute (since it is encoded for such context). The caller should make sure that the attribute value does not abut unsafe characters--and thus should usually err on the side of including a space character after the value.

      Use of this method is discouraged as quoted attributes are generally more compatible and safer. Also note, that no attempt has been made to optimize this encoding, though it is still probably faster than other encoding libraries.

      Example JSP Usage
           <input value=<%=Encode.forHtmlUnquotedAttribute(input)%> >
       
      Encoding Table
      Input Result
      U+0009 (horizontal tab) &#9;
      U+000A (line feed) &#10;
      U+000C (form feed) &#12;
      U+000D (carriage return) &#13;
      U+0020 (space) &#32;
      & &amp;
      < &lt;
      > &gt;
      " &#34;
      ' &#39;
      / &#47;
      = &#61;
      ` &#96;
      U+0085 (next line) &#133;
      U+2028 (line separator) &#8232;
      U+2029 (paragraph separator) &#8233;

      Additional Notes

      • The following characters are not encoded: 0-9, a-z, A-Z, !, #, $, %, (, ), *, +, ,, -, ., [, \, ], ^, _, }.
      • Surrogate pairs are passed through only if valid. Invalid surrogate pairs are replaced by a hyphen (-).
      • Characters in the C0 and C1 control blocks and not otherwise listed above are considered invalid and replaced by a hyphen (-) character.
      • Unicode "non-characters" are replaced by hyphens (-).
      Parameters:
      input - the attribute value to be encoded.
      Returns:
      the attribute value encoded for unquoted attribute context.
    • forHtmlUnquotedAttribute

      public static void forHtmlUnquotedAttribute(Writer out, String input) throws IOException
      See forHtmlUnquotedAttribute(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forCssString

      public static String forCssString(String input)
      Encodes for CSS strings. The context must be surrounded by quotation characters. It is safe for use in both style blocks and attributes in HTML. Example JSP Usage
           <div style="background: url('<=Encode.forCssString(...)%>');">
      
           <style type="text/css">
               background: url('<%=Encode.forCssString(...)%>');
           </style>
       
      Encoding Notes
      • The following characters are encoded using hexidecimal encodings: U+0000 - U+001f, ", ', \, <, &, (, ), /, >, U+007f, line separator (U+2028), paragraph separator (U+2029).
      • Any character requiring encoding is encoded as \xxx where xxx is the shortest hexidecimal representation of its Unicode code point (after decoding surrogate pairs if necessary). This encoding is never zero padded. Thus, for example, the tab character is encoded as \9, not \0009.
      • The encoder looks ahead 1 character in the input and appends a space to an encoding to avoid the next character becoming part of the hexidecimal encoded sequence. Thus “'1” is encoded as “\27 1”, and not as “\271”. If a space is not necessary, it is not included, thus “ 'x” is encoded as “\27x”, and not as “\27 x”.
      • Surrogate pairs are passed through only if valid. Invalid surrogate pairs are replaced by an underscore (_).
      • Unicode "non-characters" are replaced by underscores (_).
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forCssString

      public static void forCssString(Writer out, String input) throws IOException
      See forCssString(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forCssUrl

      public static String forCssUrl(String input)
      Encodes for CSS URL contexts. The context must be surrounded by "url(" and ")". It is safe for use in both style blocks and attributes in HTML. Note: this does not do any checking on the quality or safety of the URL itself. The caller should insure that the URL is safe for embedding (e.g. input validation) by other means. Example JSP Usage
           <div style="background:url(<=Encode.forCssUrl(...)%>);">
      
           <style type="text/css">
               background: url(<%=Encode.forCssUrl(...)%>);
           </style>
       
      Encoding Notes
      • The following characters are encoded using hexidecimal encodings: U+0000 - U+001f, ", ', \, <, &, /, >, U+007f, line separator (U+2028), paragraph separator (U+2029).
      • Any character requiring encoding is encoded as \xxx where xxx is the shortest hexidecimal representation of its Unicode code point (after decoding surrogate pairs if necessary). This encoding is never zero padded. Thus, for example, the tab character is encoded as \9, not \0009.
      • The encoder looks ahead 1 character in the input and appends a space to an encoding to avoid the next character becoming part of the hexidecimal encoded sequence. Thus “'1” is encoded as “\27 1”, and not as “\271”. If a space is not necessary, it is not included, thus “ 'x” is encoded as “\27x”, and not as “\27 x”.
      • Surrogate pairs are passed through only if valid. Invalid surrogate pairs are replaced by an underscore (_).
      • Unicode "non-characters" are replaced by underscores (_).
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forCssUrl

      public static void forCssUrl(Writer out, String input) throws IOException
      See forCssUrl(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forUri

      @Deprecated public static String forUri(String input)
      Deprecated.

      Performs percent-encoding of a URL according to RFC 3986. The provided URL is assumed to a valid URL. This method does not do any checking on the quality or safety of the URL itself. In many applications it may be better to use URI instead. Note: this is a particularly dangerous context to put untrusted content in, as for example a "javascript:" URL provided by a malicious user would be "properly" escaped, and still execute.

      Encoding Table

      The following characters are not encoded:

       U+20:   !   # $   & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ;   =   ?
       U+40: @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [   ]   _
       U+60:   a b c d e f g h i j k l m n o p q r s t u v w x y z       ~
       
      Encoding Notes
      • The single-quote character(') is not encoded.
      • This encoding is not intended to be used standalone. The output should be encoded to the target context. For example: <a href="<%=Encode.forHtmlAttribute(Encode.forUri(uri))%>">...</a>. (Note, the single-quote character (') is not encoded.)
      • URL encoding is an encoding for bytes, not unicode. The input string is thus first encoded as a sequence of UTF-8 byte. The bytes are then encoded as %xx where xx is the two-digit hexidecimal representation of the byte. (The implementation does this as one step for performance.)
      • Surrogate pairs are first decoded to a Unicode code point before encoding as UTF-8.
      • Invalid characters (e.g. partial or invalid surrogate pairs), are replaced with a hyphen (-) character.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forUri

      @Deprecated public static void forUri(Writer out, String input) throws IOException
      Deprecated.
      There is never a need to encode a complete URI with this form of encoding.
      See forUri(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forUriComponent

      public static String forUriComponent(String input)
      Performs percent-encoding for a component of a URI, such as a query parameter name or value, path or query-string. In particular this method insures that special characters in the component do not get interpreted as part of another component.
           <a href="http://www.owasp.org/<%=Encode.forUriComponent(...)%>?query#fragment">
      
           <a href="/search?value=<%=Encode.forUriComponent(...)%>&order=1#top">
       
      Encoding Table

      The following characters are not encoded:

       U+20:                           - .   0 1 2 3 4 5 6 7 8 9
       U+40: @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z         _
       U+60:   a b c d e f g h i j k l m n o p q r s t u v w x y z       ~
       
      Encoding Notes
      • Unlike forUri(String) this method is safe to be used in most containing contexts, including: HTML/XML, CSS, and JavaScript contexts.
      • URL encoding is an encoding for bytes, not unicode. The input string is thus first encoded as a sequence of UTF-8 byte. The bytes are then encoded as %xx where xx is the two-digit hexidecimal representation of the byte. (The implementation does this as one step for performance.)
      • Surrogate pairs are first decoded to a Unicode code point before encoding as UTF-8.
      • Invalid characters (e.g. partial or invalid surrogate pairs), are replaced with a hyphen (-) character.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forUriComponent

      public static void forUriComponent(Writer out, String input) throws IOException
      See forUriComponent(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forXml

      public static String forXml(String input)
      Encoder for XML and XHTML. See forHtml(String) for a description of the encoding and context.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
      See Also:
    • forXml

      public static void forXml(Writer out, String input) throws IOException
      See forXml(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forXmlContent

      public static String forXmlContent(String input)
      Encoder for XML and XHTML text content. See forHtmlContent(String) for description of encoding and context.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
      See Also:
    • forXmlContent

      public static void forXmlContent(Writer out, String input) throws IOException
      See forXmlContent(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forXmlAttribute

      public static String forXmlAttribute(String input)
      Encoder for XML and XHTML attribute content. See forHtmlAttribute(String) for description of encoding and context.
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
      See Also:
    • forXmlAttribute

      public static void forXmlAttribute(Writer out, String input) throws IOException
      See forXmlAttribute(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forXmlComment

      public static String forXmlComment(String input)
      Encoder for XML comments. NOT FOR USE WITH (X)HTML CONTEXTS. (X)HTML comments may be interpreted by browsers as something other than a comment, typically in vendor specific extensions (e.g. <--if[IE]-->). For (X)HTML it is recommend that unsafe content never be included in a comment.

      The caller must provide the comment start and end sequences.

      This method replaces all invalid XML characters with spaces, and replaces the "--" sequence (which is invalid in XML comments) with "-~" (hyphen-tilde). This encoding behavior may change in future releases. If the comments need to be decoded, the caller will need to come up with their own encode/decode system.

           out.println("<?xml version='1.0'?>");
           out.println("<data>");
           out.println("<!-- "+Encode.forXmlComment(comment)+" -->");
           out.println("</data>");
       
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forXmlComment

      public static void forXmlComment(Writer out, String input) throws IOException
      See forXmlComment(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forCDATA

      public static String forCDATA(String input)
      Encodes data for an XML CDATA section. On the chance that the input contains a terminating "]]>", it will be replaced by "]]>]]<![CDATA[>". As with all XML contexts, characters that are invalid according to the XML specification will be replaced by a space character. Caller must provide the CDATA section boundaries.
           <xml-data><![CDATA[<%=Encode.forCDATA(...)%>]]></xml-data>
       
      Parameters:
      input - the input to encode
      Returns:
      the encoded result
    • forCDATA

      public static void forCDATA(Writer out, String input) throws IOException
      See forCDATA(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forJava

      public static String forJava(String input)
      Encodes for a Java string. This method will use "\b", "\t", "\r", "\f", "\n", "\"", "\'", "\\", octal and unicode escapes. Valid surrogate pairing is not checked. The caller must provide the enclosing quotation characters. This method is useful for when writing code generators and outputting debug messages.
           out.println("public class Hello {");
           out.println("    public static void main(String[] args) {");
           out.println("        System.out.println(\"" + Encode.forJava(message) + "\");");
           out.println("    }");
           out.println("}");
       
      Parameters:
      input - the input to encode
      Returns:
      the input encoded for java strings.
    • forJava

      public static void forJava(Writer out, String input) throws IOException
      See forJava(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forJavaScript

      public static String forJavaScript(String input)

      Encodes for a JavaScript string. It is safe for use in HTML script attributes (such as onclick), script blocks, JSON files, and JavaScript source. The caller MUST provide the surrounding quotation characters for the string. Since this performs additional encoding so it can work in all of the JavaScript contexts listed, it may be slightly less efficient than using one of the methods targetted to a specific JavaScript context (forJavaScriptAttribute(String), forJavaScriptBlock(java.lang.String), forJavaScriptSource(java.lang.String)). Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use this method over the others.

      Example JSP Usage:
          <button onclick="alert('<%=Encode.forJavaScript(data)%>');">
          <script type="text/javascript">
              var data = "<%=Encode.forJavaScript(data)%>";
          </script>
       
      Encoding Description
      Input Character Encoded Result Notes
      U+0008BS \b Backspace character
      U+0009HT \t Horizontal tab character
      U+000ALF \n Line feed character
      U+000CFF \f Form feed character
      U+000DCR \r Carriage return character
      U+0022" \x22 The encoding \" is not used here because it is not safe for use in HTML attributes. (In HTML attributes, it would also be correct to use "\&quot;".)
      U+0026& \x26 Ampersand character
      U+0027' \x27 The encoding \' is not used here because it is not safe for use in HTML attributes. (In HTML attributes, it would also be correct to use "\&#39;".)
      U+002F/ \/ This encoding is used to avoid an input sequence "</" from prematurely terminating a </script> block.
      U+005C\ \\
      U+0000 to U+001F \x## Hexadecimal encoding is used for characters in this range that were not already mentioned in above.
      Parameters:
      input - the input string to encode
      Returns:
      the input encoded for JavaScript
      See Also:
    • forJavaScript

      public static void forJavaScript(Writer out, String input) throws IOException
      See forJavaScript(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forJavaScriptAttribute

      public static String forJavaScriptAttribute(String input)

      This method encodes for JavaScript strings contained within HTML script attributes (such as onclick). It is NOT safe for use in script blocks. The caller MUST provide the surrounding quotation characters. This method performs the same encode as forJavaScript(String) with the exception that / is not escaped.

      Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use forJavaScript(String) over this method.

      Example JSP Usage:
          <button onclick="alert('<%=Encode.forJavaScriptAttribute(data)%>');">
       
      Parameters:
      input - the input string to encode
      Returns:
      the input encoded for JavaScript
      See Also:
    • forJavaScriptAttribute

      public static void forJavaScriptAttribute(Writer out, String input) throws IOException
      See forJavaScriptAttribute(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forJavaScriptBlock

      public static String forJavaScriptBlock(String input)

      This method encodes for JavaScript strings contained within HTML script blocks. It is NOT safe for use in script attributes (such as onclick). The caller must provide the surrounding quotation characters. This method performs the same encode as forJavaScript(String) with the exception that " and ' are encoded as \" and \' respectively.

      Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use forJavaScript(String) over this method.

      Example JSP Usage:
          <script type="text/javascript">
              var data = "<%=Encode.forJavaScriptBlock(data)%>";
          </script>
       
      Parameters:
      input - the input string to encode
      Returns:
      the input encoded for JavaScript
      See Also:
    • forJavaScriptBlock

      public static void forJavaScriptBlock(Writer out, String input) throws IOException
      See forJavaScriptBlock(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer
    • forJavaScriptSource

      public static String forJavaScriptSource(String input)

      This method encodes for JavaScript strings contained within a JavaScript or JSON file. This method is NOT safe for use in ANY context embedded in HTML. The caller must provide the surrounding quotation characters. This method performs the same encode as forJavaScript(String) with the exception that / and & are not escaped and " and ' are encoded as \" and \' respectively.

      Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use forJavaScript(String) over this method.

      Example JSP Usage: This example is serving up JavaScript source directly:
          <%@page contentType="text/javascript; charset=UTF-8"%>
          var data = "<%=Encode.forJavaScriptSource(data)%>";
       
      This example is serving up JSON data (users of this use-case are encouraged to read up on "JSON Hijacking"):
          <%@page contentType="application/json; charset=UTF-8"%>
          <% myapp.jsonHijackingPreventionMeasure(); %>
          {"data":"<%=Encode.forJavaScriptSource(data)%>"}
       
      Parameters:
      input - the input string to encode
      Returns:
      the input encoded for JavaScript
      See Also:
    • forJavaScriptSource

      public static void forJavaScriptSource(Writer out, String input) throws IOException
      See forJavaScriptSource(String) for description of encoding. This version writes directly to a Writer without an intervening string.
      Parameters:
      out - where to write encoded output
      input - the input string to encode
      Throws:
      IOException - if thrown by writer