public final class CmsEncoder extends java.lang.Object
The methods in this class are substitutes for java.net.URLEncoder.encode()
and
java.net.URLDecoder.decode()
. Use the methods from this class in all OpenCms
core classes to ensure the encoding is always handled the same way.
The de- and encoding uses the same coding mechanism as JavaScript, special characters are
replaced with %hex
where hex is a two digit hex number.
Note: On the client side (browser) instead of using the deprecated escape
and unescape
JavaScript functions, always the use encodeURIComponent
and
decodeURIComponent
functions. Only these work properly with unicode characters.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
BASE64_EXTRA
Non-alphanumeric characters used for Base64 encoding.
|
static java.lang.String |
BASE64_EXTRA_REPLACEMENTS
Characters used as replacements for non-alphanumeric Base64 characters when using Base64 for request parameters.
|
static java.lang.String |
ENCODING_ISO_8859_1
Constant for the standard
ISO-8859-1 encoding. |
static java.lang.String |
ENCODING_US_ASCII
Constant for the standard
US-ASCII encoding. |
static java.lang.String |
ENCODING_UTF_8
Constant for the standard
UTF-8 encoding. |
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
adjustHtmlEncoding(java.lang.String input,
java.lang.String encoding)
Adjusts the given String by making sure all characters that can be displayed
in the given charset are contained as chars, whereas all other non-displayable
characters are converted to HTML entities.
|
static byte[] |
changeEncoding(byte[] input,
java.lang.String oldEncoding,
java.lang.String newEncoding)
Changes the encoding of a byte array that represents a String.
|
static java.lang.String |
convertHostToPunycode(java.lang.String uriString)
Converts the host of an URI to Punycode.
|
static java.lang.String |
createString(byte[] bytes,
java.lang.String encoding)
Creates a String out of a byte array with the specified encoding, falling back
to the system default in case the encoding name is not valid.
|
static java.lang.String |
decode(java.lang.String source)
Decodes a String using UTF-8 encoding, which is the standard for http data transmission
with GET ant POST requests.
|
static java.lang.String |
decode(java.lang.String source,
java.lang.String encoding)
This method is a substitute for
URLDecoder.decode() . |
static java.lang.String |
decodeHtmlEntities(java.lang.String input,
java.lang.String encoding)
Decodes HTML entity references like
€ that are contained in the
String to a regular character, but only if that character is contained in the given
encodings charset. |
static java.lang.String |
decodeParameter(java.lang.String input)
Decodes a string used as parameter in an uri in a way independent of other encodings/decodings applied before.
|
static java.util.List<java.lang.String> |
decodeStringsFromBase64Parameter(java.lang.String data)
Decodes a parameter which has been encoded from a string list using encodeStringsAsBase64Parameter.
|
static java.lang.String |
encode(java.lang.String source)
Encodes a String using UTF-8 encoding, which is the standard for http data transmission
with GET ant POST requests.
|
static java.lang.String |
encode(java.lang.String source,
java.lang.String encoding)
This method is a substitute for
URLEncoder.encode() . |
static java.lang.String |
encodeHtmlEntities(java.lang.String input,
java.lang.String encoding)
Encodes all characters that are contained in the String which can not displayed
in the given encodings charset with HTML entity references
like
€ . |
static java.lang.String |
encodeJavaEntities(java.lang.String input,
java.lang.String encoding)
Encodes all characters that are contained in the String which can not displayed
in the given encodings charset with Java escaping like
€ . |
static java.lang.String |
encodeParameter(java.lang.String input)
Encodes a string used as parameter in an uri in a way independent of other encodings/decodings applied later.
|
static java.lang.String |
encodeStringsAsBase64Parameter(java.util.List<java.lang.String> strings)
Encode a list of strings as base64 data to be used in a request parameter.
|
static java.lang.String |
escape(java.lang.String source)
Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function,
using "UTF-8" for character encoding encoding.
|
static java.lang.String |
escape(java.lang.String source,
java.lang.String encoding)
Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function.
|
static java.lang.String |
escapeHtml(java.lang.String source)
Escapes special characters in a HTML-String with their number-based
entity representation, for example & becomes &.
|
static java.lang.String |
escapeNonAscii(java.lang.String source)
Escapes non ASCII characters in a HTML-String with their number-based
entity representation, for example & becomes &.
|
static java.lang.String |
escapeSql(java.lang.String source)
A simple method to avoid injection.
|
static java.lang.String |
escapeSqlLikePattern(java.lang.String pattern,
char escapeChar)
Escapes the wildcard characters in a string which will be used as the pattern for a SQL LIKE clause.
|
static java.lang.String |
escapeWBlanks(java.lang.String source,
java.lang.String encoding)
Encodes a String in a way similar JavaScript "encodeURIcomponent" function.
|
static java.lang.String |
escapeXml(java.lang.String source)
Escapes a String so it may be printed as text content or attribute
value in a HTML page or an XML file.
|
static java.lang.String |
escapeXml(java.lang.String source,
boolean doubleEscape)
Escapes a String so it may be printed as text content or attribute
value in a HTML page or an XML file.
|
static java.lang.String |
lookupEncoding(java.lang.String encoding,
java.lang.String fallback)
Checks if a given encoding name is actually supported, and if so
resolves it to it's canonical name, if not it returns the given fallback
value.
|
static java.lang.String |
redecodeUriComponent(java.lang.String input)
Re-decodes a String that has not been correctly decoded and thus has scrambled
character bytes.
|
static java.lang.String |
unescape(java.lang.String source)
Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function,
using "UTF-8" for character encoding.
|
static java.lang.String |
unescape(java.lang.String source,
java.lang.String encoding)
Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function.
|
public static final java.lang.String BASE64_EXTRA
public static final java.lang.String BASE64_EXTRA_REPLACEMENTS
public static final java.lang.String ENCODING_ISO_8859_1
ISO-8859-1
encoding.public static final java.lang.String ENCODING_US_ASCII
US-ASCII
encoding.public static final java.lang.String ENCODING_UTF_8
UTF-8
encoding.
Default encoding for JavaScript decodeUriComponent methods is UTF-8
by w3c standard.
public static java.lang.String adjustHtmlEncoding(java.lang.String input, java.lang.String encoding)
Just calls decodeHtmlEntities(String, String)
first and feeds the result
to encodeHtmlEntities(String, String)
.
input
- the input to adjust the HTML encoding forencoding
- the charset to encode the result with\public static byte[] changeEncoding(byte[] input, java.lang.String oldEncoding, java.lang.String newEncoding)
input
- the byte array to convertoldEncoding
- the current encoding of the byte arraynewEncoding
- the new encoding of the byte arraypublic static java.lang.String convertHostToPunycode(java.lang.String uriString)
This is needed when we want to do redirects to hosts with host names containing international characters like umlauts.
uriString
- the URIpublic static java.lang.String createString(byte[] bytes, java.lang.String encoding)
Use this method as a replacement for new String(byte[], encoding)
to avoid possible encoding problems.
bytes
- the bytes to decodeencoding
- the encoding scheme to use for decoding the bytespublic static java.lang.String decode(java.lang.String source)
source
- the String to decodepublic static java.lang.String decode(java.lang.String source, java.lang.String encoding)
URLDecoder.decode()
.
Use this in all OpenCms core classes to ensure the encoding is
always handled the same way.
In case you don't know what encoding to use, set the value of
the encoding
parameter to null
.
This method will then default to UTF-8 encoding, which is probably the right one.
source
- The string to decodeencoding
- The encoding to use (if null, the system default is used)public static java.lang.String decodeHtmlEntities(java.lang.String input, java.lang.String encoding)
€
that are contained in the
String to a regular character, but only if that character is contained in the given
encodings charset.input
- the input to decode the HTML entities inencoding
- the charset to decode the input forencodeHtmlEntities(String, String)
public static java.lang.String decodeParameter(java.lang.String input)
input
- the encoded parameter stringencodeParameter(String)
public static java.util.List<java.lang.String> decodeStringsFromBase64Parameter(java.lang.String data)
data
- the data to decodepublic static java.lang.String encode(java.lang.String source)
source
- the String to encodepublic static java.lang.String encode(java.lang.String source, java.lang.String encoding)
URLEncoder.encode()
.
Use this in all OpenCms core classes to ensure the encoding is
always handled the same way.
In case you don't know what encoding to use, set the value of
the encoding
parameter to null
.
This method will then default to UTF-8 encoding, which is probably the right one.
source
- the String to encodeencoding
- the encoding to use (if null, the system default is used)public static java.lang.String encodeHtmlEntities(java.lang.String input, java.lang.String encoding)
€
.This is required since a Java String is internally always stored as Unicode, meaning it can contain almost every character, but the HTML charset used might not support all such characters.
input
- the input to encode for HTMLencoding
- the charset to encode the result withdecodeHtmlEntities(String, String)
public static java.lang.String encodeJavaEntities(java.lang.String input, java.lang.String encoding)
€
.This can be used to escape values used in Java property files.
input
- the input to encode for Javaencoding
- the charset to encode the result withpublic static java.lang.String encodeParameter(java.lang.String input)
Used to ensure that GET parameters are not wrecked by wrong or incompatible configuration settings. In order to ensure this, the String is first encoded with html entities for any character that cannot encoded in US-ASCII; additionally, the plus sign is also encoded to avoid problems with the white-space replacer. Finally, the entity prefix is replaced with characters not used as delimiters in urls.
input
- the parameter stringpublic static java.lang.String encodeStringsAsBase64Parameter(java.util.List<java.lang.String> strings)
strings
- the strings to encodepublic static java.lang.String escape(java.lang.String source)
JavaScript "decodeURIcomponent" can decode Strings that have been encoded using this method.
Directly exposed for JSP EL, not through CmsJspElFunctions
.
source
- The text to be encodedescape(String, String)
public static java.lang.String escape(java.lang.String source, java.lang.String encoding)
JavaScript "decodeURIcomponent" can decode Strings that have been encoded using this method, provided "UTF-8" has been used as encoding.
Directly exposed for JSP EL, not through CmsJspElFunctions
.
source
- The text to be encodedencoding
- the encoding typepublic static java.lang.String escapeHtml(java.lang.String source)
A character num
is replaced if
((ch != 32) && ((ch > 122) || (ch < 48) || (ch == 60) || (ch == 62)))
source
- the String to escapeescapeXml(String)
public static java.lang.String escapeNonAscii(java.lang.String source)
A character num
is replaced if
(ch > 255)
source
- the String to escapeescapeXml(String)
public static java.lang.String escapeSql(java.lang.String source)
Replaces all single quotes to double single quotes in the value parameter of the SQL statement.
source
- the String to escape SQL frompublic static java.lang.String escapeSqlLikePattern(java.lang.String pattern, char escapeChar)
pattern
- the patternescapeChar
- the character which should be used as the escape characterpublic static java.lang.String escapeWBlanks(java.lang.String source, java.lang.String encoding)
Multiple blanks are encoded _multiply_ with %20
.
source
- The text to be encodedencoding
- the encoding typepublic static java.lang.String escapeXml(java.lang.String source)
This method replaces the following characters in a String:
source
- the string to escapeescapeHtml(String)
public static java.lang.String escapeXml(java.lang.String source, boolean doubleEscape)
This method replaces the following characters in a String:
source
- the string to escapedoubleEscape
- if false
, all entities that already are escaped are left untouchedescapeHtml(String)
public static java.lang.String lookupEncoding(java.lang.String encoding, java.lang.String fallback)
Charsets have a set of aliases. For example, valid aliases for "UTF-8" are "UTF8", "utf-8" or "utf8". This method resolves any given valid charset name to it's "canonical" form, so that simple String comparison can be used when checking charset names internally later.
Please see http://www.iana.org/assignments/character-sets for a list of valid charset alias names.
encoding
- the encoding to check and resolvefallback
- the fallback encoding schemepublic static java.lang.String redecodeUriComponent(java.lang.String input)
This is an equivalent to the JavaScript "decodeURIComponent" function. It converts from the default "UTF-8" to the currently selected system encoding.
input
- the String to convertpublic static java.lang.String unescape(java.lang.String source)
This method can decode Strings that have been encoded in JavaScript with "encodeURIcomponent".
Directly exposed for JSP EL, not through CmsJspElFunctions
.
source
- The String to be decodedpublic static java.lang.String unescape(java.lang.String source, java.lang.String encoding)
This method can decode Strings that have been encoded in JavaScript with "encodeURIcomponent", provided "UTF-8" is used as encoding.
Directly exposed for JSP EL, not through CmsJspElFunctions
.
source
- The String to be decodedencoding
- the encoding type