Class Encode
<input value="<%=Encode.forHtml(value)%>" />
There are two versions of each contextual encoding method. The first
takes a String
argument and returns the encoded version as a
String
. The second version writes the encoded version directly
to a Writer
.
Please make sure to read and understand the context that the method encodes for. Encoding for the incorrect context will likely lead to exposing a cross-site scripting vulnerability.
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
SeeforCDATA(String)
for description of encoding.static String
Encodes data for an XML CDATA section.static void
forCssString
(Writer out, String input) SeeforCssString(String)
for description of encoding.static String
forCssString
(String input) Encodes for CSS strings.static void
SeeforCssUrl(String)
for description of encoding.static String
Encodes for CSS URL contexts.static void
SeeforHtml(String)
for description of encoding.static String
Encodes for (X)HTML text content and text attributes.static void
forHtmlAttribute
(Writer out, String input) SeeforHtmlAttribute(String)
for description of encoding.static String
forHtmlAttribute
(String input) This method encodes for HTML text attributes.static void
forHtmlContent
(Writer out, String input) SeeforHtmlContent(String)
for description of encoding.static String
forHtmlContent
(String input) This method encodes for HTML text content.static void
forHtmlUnquotedAttribute
(Writer out, String input) SeeforHtmlUnquotedAttribute(String)
for description of encoding.static String
forHtmlUnquotedAttribute
(String input) Encodes for unquoted HTML attribute values.static void
SeeforJava(String)
for description of encoding.static String
Encodes for a Java string.static void
forJavaScript
(Writer out, String input) SeeforJavaScript(String)
for description of encoding.static String
forJavaScript
(String input) Encodes for a JavaScript string.static void
forJavaScriptAttribute
(Writer out, String input) SeeforJavaScriptAttribute(String)
for description of encoding.static String
forJavaScriptAttribute
(String input) This method encodes for JavaScript strings contained within HTML script attributes (such asonclick
).static void
forJavaScriptBlock
(Writer out, String input) SeeforJavaScriptBlock(String)
for description of encoding.static String
forJavaScriptBlock
(String input) This method encodes for JavaScript strings contained within HTML script blocks.static void
forJavaScriptSource
(Writer out, String input) SeeforJavaScriptSource(String)
for description of encoding.static String
forJavaScriptSource
(String input) This method encodes for JavaScript strings contained within a JavaScript or JSON file.static void
Deprecated.There is never a need to encode a complete URI with this form of encoding.static String
Deprecated.static void
forUriComponent
(Writer out, String input) SeeforUriComponent(String)
for description of encoding.static String
forUriComponent
(String input) Performs percent-encoding for a component of a URI, such as a query parameter name or value, path or query-string.static void
SeeforXml(String)
for description of encoding.static String
Encoder for XML and XHTML.static void
forXmlAttribute
(Writer out, String input) SeeforXmlAttribute(String)
for description of encoding.static String
forXmlAttribute
(String input) Encoder for XML and XHTML attribute content.static void
forXmlComment
(Writer out, String input) SeeforXmlComment(String)
for description of encoding.static String
forXmlComment
(String input) Encoder for XML comments.static void
forXmlContent
(Writer out, String input) SeeforXmlContent(String)
for description of encoding.static String
forXmlContent
(String input) Encoder for XML and XHTML text content.
-
Method Details
-
forHtml
Encodes for (X)HTML text content and text attributes. Since this method encodes for both contexts, it may be slightly less efficient to use this method over the methods targeted towards the specific contexts (
Example JSP UsageforHtmlAttribute(String)
andforHtmlContent(String)
. In general this method should be preferred unless you are really concerned with saving a few bytes or are writing a framework that utilizes this package.<div><%=Encode.forHtml(unsafeData)%></div> <input value="<%=Encode.forHtml(unsafeData)%>" />
Encoding Table Input Result &
&
<
<
>
>
"
"
'
'
Additional Notes
- The encoding of the greater-than sign (
>
) is not strictly required, but is included for maximum compatibility. - Numeric encoding is used for double-quote character (
"
) as it shorter than the also valid"
. - Carriage return (U+0D), line-feed (U+0A), horizontal tab (U+09) and space (U+20) are valid in quoted attributes and in block in an unescaped form.
- Surrogate pairs are passed through only if valid.
- Characters that are not valid according
to the XML specification are replaced by a space character
as they could lead to parsing errors. In particular only
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
are considered valid.
- Parameters:
input
- the data to encode- Returns:
- the data encoded for html.
- The encoding of the greater-than sign (
-
forHtml
SeeforHtml(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forHtmlContent
This method encodes for HTML text content. It does not escape quotation characters and is thus unsafe for use with HTML attributes. Use either forHtml or forHtmlAttribute for those methods.
Example JSP Usage<div><%=Encode.forHtmlContent(unsafeData)%></div>
Encoding Table Input Result &
&
<
<
>
>
Additional Notes
- Single-quote character (
'
) and double-quote character ("
) do not require encoding in HTML blocks, unlike other HTML contexts. - The encoding of the greater-than sign (
>
) is not strictly required, but is included for maximum compatibility. - Carriage return (U+0D), line-feed (U+0A), horizontal tab (U+09) and space (U+20) are valid in quoted attributes and in block in an unescaped form.
- Surrogate pairs are passed through only if valid.
- Characters that are not valid according
to the XML specification are replaced by a space character
as they could lead to parsing errors. In particular only
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
are considered valid.
- Parameters:
input
- the input to encode- Returns:
- the encoded result
- Single-quote character (
-
forHtmlContent
SeeforHtmlContent(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forHtmlAttribute
This method encodes for HTML text attributes.
Example JSP Usage<div><%=Encode.forHtmlAttribute(unsafeData)%></div>
Encoding Table Input Result &
&
<
<
"
"
'
'
Additional Notes
- Both the single-quote character (
'
) and the double-quote character ("
) are encoded so this is safe for HTML attributes with either enclosing character. - The encoding of the greater-than sign (
>
) is not required for attributes. - Numeric encoding is used for double-quote character (
"
) as it shorter than the also valid"
. - Carriage return (U+0D), line-feed (U+0A), horizontal tab (U+09) and space (U+20) are valid in quoted attributes and in block in an unescaped form.
- Surrogate pairs are passed through only if valid.
- Characters that are not valid according
to the XML specification are replaced by a space character
as they could lead to parsing errors. In particular only
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
are considered valid.
- Parameters:
input
- the input to encode- Returns:
- the encoded result
- Both the single-quote character (
-
forHtmlAttribute
SeeforHtmlAttribute(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forHtmlUnquotedAttribute
Encodes for unquoted HTML attribute values.
forHtml(String)
orforHtmlAttribute(String)
should usually be preferred over this method as quoted attributes are XHTML compliant.When using this method, the caller is not required to provide quotes around the attribute (since it is encoded for such context). The caller should make sure that the attribute value does not abut unsafe characters--and thus should usually err on the side of including a space character after the value.
Use of this method is discouraged as quoted attributes are generally more compatible and safer. Also note, that no attempt has been made to optimize this encoding, though it is still probably faster than other encoding libraries.
Example JSP Usage<input value=<%=Encode.forHtmlUnquotedAttribute(input)%> >
Encoding Table Input Result U+0009
(horizontal tab)	
U+000A
(line feed)
U+000C
(form feed)
U+000D
(carriage return)
U+0020
(space) 
&
&
<
<
>
>
"
"
'
'
/
/
=
=
`
`
U+0085
(next line)…
U+2028
(line separator)

U+2029
(paragraph separator)

Additional Notes
- The following characters are not encoded:
0-9, a-z, A-Z
,!
,#
,$
,%
,(
,)
,*
,+
,,
,-
,.
,[
,\
,]
,^
,_
, - Surrogate pairs are passed through only if valid. Invalid surrogate pairs are replaced by a hyphen (-).
- Characters in the C0 and C1 control blocks and not otherwise listed above are considered invalid and replaced by a hyphen (-) character.
- Unicode "non-characters" are replaced by hyphens (-).
- Parameters:
input
- the attribute value to be encoded.- Returns:
- the attribute value encoded for unquoted attribute context.
- The following characters are not encoded:
-
forHtmlUnquotedAttribute
SeeforHtmlUnquotedAttribute(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forCssString
Encodes for CSS strings. The context must be surrounded by quotation characters. It is safe for use in both style blocks and attributes in HTML. Example JSP Usage<div style="background: url('<=Encode.forCssString(...)%>');"> <style type="text/css"> background: url('<%=Encode.forCssString(...)%>'); </style>
Encoding Notes- The following characters are encoded using hexidecimal
encodings:
U+0000
-U+001f
,"
,'
,\
,<
,&
,(
,)
,/
,>
,U+007f
, line separator (U+2028
), paragraph separator (U+2029
). - Any character requiring encoding is encoded as
\xxx
wherexxx
is the shortest hexidecimal representation of its Unicode code point (after decoding surrogate pairs if necessary). This encoding is never zero padded. Thus, for example, the tab character is encoded as\9
, not\0009
. - The encoder looks ahead 1 character in the input and
appends a space to an encoding to avoid the next character
becoming part of the hexidecimal encoded sequence. Thus
“
'1
” is encoded as “\27 1
”, and not as “\271
”. If a space is not necessary, it is not included, thus “'x
” is encoded as “\27x
”, and not as “\27 x
”. - Surrogate pairs are passed through only if valid. Invalid surrogate pairs are replaced by an underscore (_).
- Unicode "non-characters" are replaced by underscores (_).
- Parameters:
input
- the input to encode- Returns:
- the encoded result
- The following characters are encoded using hexidecimal
encodings:
-
forCssString
SeeforCssString(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forCssUrl
Encodes for CSS URL contexts. The context must be surrounded by"url("
and")"
. It is safe for use in both style blocks and attributes in HTML. Note: this does not do any checking on the quality or safety of the URL itself. The caller should insure that the URL is safe for embedding (e.g. input validation) by other means. Example JSP Usage<div style="background:url(<=Encode.forCssUrl(...)%>);"> <style type="text/css"> background: url(<%=Encode.forCssUrl(...)%>); </style>
Encoding Notes- The following characters are encoded using hexidecimal
encodings:
U+0000
-U+001f
,"
,'
,\
,<
,&
,/
,>
,U+007f
, line separator (U+2028
), paragraph separator (U+2029
). - Any character requiring encoding is encoded as
\xxx
wherexxx
is the shortest hexidecimal representation of its Unicode code point (after decoding surrogate pairs if necessary). This encoding is never zero padded. Thus, for example, the tab character is encoded as\9
, not\0009
. - The encoder looks ahead 1 character in the input and
appends a space to an encoding to avoid the next character
becoming part of the hexidecimal encoded sequence. Thus
“
'1
” is encoded as “\27 1
”, and not as “\271
”. If a space is not necessary, it is not included, thus “'x
” is encoded as “\27x
”, and not as “\27 x
”. - Surrogate pairs are passed through only if valid. Invalid surrogate pairs are replaced by an underscore (_).
- Unicode "non-characters" are replaced by underscores (_).
- Parameters:
input
- the input to encode- Returns:
- the encoded result
- The following characters are encoded using hexidecimal
encodings:
-
forCssUrl
SeeforCssUrl(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forUri
Deprecated.Performs percent-encoding of a URL according to RFC 3986. The provided URL is assumed to a valid URL. This method does not do any checking on the quality or safety of the URL itself. In many applications it may be better to use
Encoding TableURI
instead. Note: this is a particularly dangerous context to put untrusted content in, as for example a "javascript:" URL provided by a malicious user would be "properly" escaped, and still execute.The following characters are not encoded:
U+20: ! # $ & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; = ? U+40: @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] _ U+60: a b c d e f g h i j k l m n o p q r s t u v w x y z ~
Encoding Notes- The single-quote character(
'
) is not encoded. - This encoding is not intended to be used standalone. The
output should be encoded to the target context. For example:
<a href="<%=Encode.forHtmlAttribute(Encode.forUri(uri))%>">...</a>
. (Note, the single-quote character ('
) is not encoded.) - URL encoding is an encoding for bytes, not unicode. The
input string is thus first encoded as a sequence of UTF-8
byte. The bytes are then encoded as
%xx
wherexx
is the two-digit hexidecimal representation of the byte. (The implementation does this as one step for performance.) - Surrogate pairs are first decoded to a Unicode code point before encoding as UTF-8.
- Invalid characters (e.g. partial or invalid surrogate
pairs), are replaced with a hyphen (
-
) character.
- Parameters:
input
- the input to encode- Returns:
- the encoded result
- The single-quote character(
-
forUri
Deprecated.There is never a need to encode a complete URI with this form of encoding.SeeforUri(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forUriComponent
Performs percent-encoding for a component of a URI, such as a query parameter name or value, path or query-string. In particular this method insures that special characters in the component do not get interpreted as part of another component.<a href="http://www.owasp.org/<%=Encode.forUriComponent(...)%>?query#fragment"> <a href="/search?value=<%=Encode.forUriComponent(...)%>&order=1#top">
Encoding TableThe following characters are not encoded:
U+20: - . 0 1 2 3 4 5 6 7 8 9 U+40: @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ U+60: a b c d e f g h i j k l m n o p q r s t u v w x y z ~
Encoding Notes- Unlike
forUri(String)
this method is safe to be used in most containing contexts, including: HTML/XML, CSS, and JavaScript contexts. - URL encoding is an encoding for bytes, not unicode. The
input string is thus first encoded as a sequence of UTF-8
byte. The bytes are then encoded as
%xx
wherexx
is the two-digit hexidecimal representation of the byte. (The implementation does this as one step for performance.) - Surrogate pairs are first decoded to a Unicode code point before encoding as UTF-8.
- Invalid characters (e.g. partial or invalid surrogate
pairs), are replaced with a hyphen (
-
) character.
- Parameters:
input
- the input to encode- Returns:
- the encoded result
- Unlike
-
forUriComponent
SeeforUriComponent(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forXml
Encoder for XML and XHTML. SeeforHtml(String)
for a description of the encoding and context.- Parameters:
input
- the input to encode- Returns:
- the encoded result
- See Also:
-
forXml
SeeforXml(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forXmlContent
Encoder for XML and XHTML text content. SeeforHtmlContent(String)
for description of encoding and context.- Parameters:
input
- the input to encode- Returns:
- the encoded result
- See Also:
-
forXmlContent
SeeforXmlContent(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forXmlAttribute
Encoder for XML and XHTML attribute content. SeeforHtmlAttribute(String)
for description of encoding and context.- Parameters:
input
- the input to encode- Returns:
- the encoded result
- See Also:
-
forXmlAttribute
SeeforXmlAttribute(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forXmlComment
Encoder for XML comments. NOT FOR USE WITH (X)HTML CONTEXTS. (X)HTML comments may be interpreted by browsers as something other than a comment, typically in vendor specific extensions (e.g.<--if[IE]-->
). For (X)HTML it is recommend that unsafe content never be included in a comment.The caller must provide the comment start and end sequences.
This method replaces all invalid XML characters with spaces, and replaces the "--" sequence (which is invalid in XML comments) with "-~" (hyphen-tilde). This encoding behavior may change in future releases. If the comments need to be decoded, the caller will need to come up with their own encode/decode system.
out.println("<?xml version='1.0'?>"); out.println("<data>"); out.println("<!-- "+Encode.forXmlComment(comment)+" -->"); out.println("</data>");
- Parameters:
input
- the input to encode- Returns:
- the encoded result
-
forXmlComment
SeeforXmlComment(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forCDATA
Encodes data for an XML CDATA section. On the chance that the input contains a terminating"]]>"
, it will be replaced by"]]>]]<![CDATA[>"
. As with all XML contexts, characters that are invalid according to the XML specification will be replaced by a space character. Caller must provide the CDATA section boundaries.<xml-data><![CDATA[<%=Encode.forCDATA(...)%>]]></xml-data>
- Parameters:
input
- the input to encode- Returns:
- the encoded result
-
forCDATA
SeeforCDATA(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forJava
Encodes for a Java string. This method will use "\b", "\t", "\r", "\f", "\n", "\"", "\'", "\\", octal and unicode escapes. Valid surrogate pairing is not checked. The caller must provide the enclosing quotation characters. This method is useful for when writing code generators and outputting debug messages.out.println("public class Hello {"); out.println(" public static void main(String[] args) {"); out.println(" System.out.println(\"" + Encode.forJava(message) + "\");"); out.println(" }"); out.println("}");
- Parameters:
input
- the input to encode- Returns:
- the input encoded for java strings.
-
forJava
SeeforJava(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forJavaScript
Encodes for a JavaScript string. It is safe for use in HTML script attributes (such as
Example JSP Usage:onclick
), script blocks, JSON files, and JavaScript source. The caller MUST provide the surrounding quotation characters for the string. Since this performs additional encoding so it can work in all of the JavaScript contexts listed, it may be slightly less efficient than using one of the methods targetted to a specific JavaScript context (forJavaScriptAttribute(String)
,forJavaScriptBlock(java.lang.String)
,forJavaScriptSource(java.lang.String)
). Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use this method over the others.<button onclick="alert('<%=Encode.forJavaScript(data)%>');"> <script type="text/javascript"> var data = "<%=Encode.forJavaScript(data)%>"; </script>
Encoding Description Input Character Encoded Result Notes U+0008 BS \b
Backspace character U+0009 HT \t
Horizontal tab character U+000A LF \n
Line feed character U+000C FF \f
Form feed character U+000D CR \r
Carriage return character U+0022 "
\x22
The encoding \"
is not used here because it is not safe for use in HTML attributes. (In HTML attributes, it would also be correct to use "\"".)U+0026 &
\x26
Ampersand character U+0027 '
\x27
The encoding \'
is not used here because it is not safe for use in HTML attributes. (In HTML attributes, it would also be correct to use "\'".)U+002F /
\/
This encoding is used to avoid an input sequence "</" from prematurely terminating a </script> block. U+005C \
\\
U+0000 to U+001F \x##
Hexadecimal encoding is used for characters in this range that were not already mentioned in above. - Parameters:
input
- the input string to encode- Returns:
- the input encoded for JavaScript
- See Also:
-
forJavaScript
SeeforJavaScript(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forJavaScriptAttribute
This method encodes for JavaScript strings contained within HTML script attributes (such as
onclick
). It is NOT safe for use in script blocks. The caller MUST provide the surrounding quotation characters. This method performs the same encode asforJavaScript(String)
with the exception that/
is not escaped.Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use
Example JSP Usage:forJavaScript(String)
over this method.<button onclick="alert('<%=Encode.forJavaScriptAttribute(data)%>');">
- Parameters:
input
- the input string to encode- Returns:
- the input encoded for JavaScript
- See Also:
-
forJavaScriptAttribute
SeeforJavaScriptAttribute(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forJavaScriptBlock
This method encodes for JavaScript strings contained within HTML script blocks. It is NOT safe for use in script attributes (such as
onclick
). The caller must provide the surrounding quotation characters. This method performs the same encode asforJavaScript(String)
with the exception that"
and'
are encoded as\"
and\'
respectively.Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use
Example JSP Usage:forJavaScript(String)
over this method.<script type="text/javascript"> var data = "<%=Encode.forJavaScriptBlock(data)%>"; </script>
- Parameters:
input
- the input string to encode- Returns:
- the input encoded for JavaScript
- See Also:
-
forJavaScriptBlock
SeeforJavaScriptBlock(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-
forJavaScriptSource
This method encodes for JavaScript strings contained within a JavaScript or JSON file. This method is NOT safe for use in ANY context embedded in HTML. The caller must provide the surrounding quotation characters. This method performs the same encode as
forJavaScript(String)
with the exception that/
and&
are not escaped and"
and'
are encoded as\"
and\'
respectively.Unless you are interested in saving a few bytes of output or are writing a framework on top of this library, it is recommend that you use
Example JSP Usage: This example is serving up JavaScript source directly:forJavaScript(String)
over this method.<%@page contentType="text/javascript; charset=UTF-8"%> var data = "<%=Encode.forJavaScriptSource(data)%>";
This example is serving up JSON data (users of this use-case are encouraged to read up on "JSON Hijacking"):<%@page contentType="application/json; charset=UTF-8"%> <% myapp.jsonHijackingPreventionMeasure(); %> {"data":"<%=Encode.forJavaScriptSource(data)%>"}
- Parameters:
input
- the input string to encode- Returns:
- the input encoded for JavaScript
- See Also:
-
forJavaScriptSource
SeeforJavaScriptSource(String)
for description of encoding. This version writes directly to a Writer without an intervening string.- Parameters:
out
- where to write encoded outputinput
- the input string to encode- Throws:
IOException
- if thrown by writer
-