public class PercentEscaper extends UnicodeEscaper
UnicodeEscaper
that escapes some set of Java characters using
the URI percent encoding scheme. The set of safe characters (those which
remain unescaped) can be specified on construction.
For details on escaping URIs for use in web pages, see section 2.4 of RFC 3986.
In most cases this class should not need to be used directly. If you have no special requirements for escaping your URIs, you should use either CharEscapers#uriEscaper() or CharEscapers#uriEscaper(boolean).
When encoding a String, the following rules apply:
plusForSpace
was specified, the space character " " is
converted into a plus sign "+".
RFC 2396 specifies the set of unreserved characters as "-", "_", ".", "!", "~", "*", "'", "(" and ")". It goes on to state:
Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.
For performance reasons the only currently supported character encoding of this class is UTF-8.
Note: This escaper produces uppercase hexidecimal sequences. From
RFC 3986:
"URI producers and normalizers should use uppercase hexadecimal digits
for all percent-encodings."
Modifier and Type | Field and Description |
---|---|
static String |
SAFECHARS_URLENCODER
A string of safe characters that mimics the behavior of
URLEncoder . |
static String |
SAFEPATHCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI
path segments, as specified in RFC 3986.
|
static String |
SAFEQUERYSTRINGCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI
query strings, as specified in RFC 3986.
|
Constructor and Description |
---|
PercentEscaper(String safeChars,
boolean plusForSpace)
Constructs a URI escaper with the specified safe characters and optional
handling of the space character.
|
Modifier and Type | Method and Description |
---|---|
protected char[] |
escape(int cp)
Escapes the given Unicode code point in UTF-8.
|
String |
escape(String s)
Returns the escaped form of a given literal string.
|
protected int |
nextEscapeIndex(CharSequence csq,
int index,
int end)
Scans a sub-sequence of characters from a given
CharSequence ,
returning the index of the next character that requires escaping. |
codePointAt, escape, escapeSlow
public static final String SAFECHARS_URLENCODER
URLEncoder
.public static final String SAFEPATHCHARS_URLENCODER
public static final String SAFEQUERYSTRINGCHARS_URLENCODER
public PercentEscaper(String safeChars, boolean plusForSpace)
safeChars
- a non null string specifying additional safe characters
for this escaper (the ranges 0..9, a..z and A..Z are always safe and
should not be specified here)plusForSpace
- true if ASCII space should be escaped to +
rather than %20
IllegalArgumentException
- if any of the parameters were invalidprotected int nextEscapeIndex(CharSequence csq, int index, int end)
UnicodeEscaper
CharSequence
,
returning the index of the next character that requires escaping.
Note: When implementing an escaper, it is a good idea to override
this method for efficiency. The base class implementation determines
successive Unicode code points and invokes UnicodeEscaper.escape(int)
for each of
them. If the semantics of your escaper are such that code points in the
supplementary range are either all escaped or all unescaped, this method
can be implemented more efficiently using CharSequence.charAt(int)
.
Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.
See PercentEscaper
for an example.
nextEscapeIndex
in class UnicodeEscaper
csq
- a sequence of charactersindex
- the index of the first character to be scannedend
- the index immediately after the last character to be scannedpublic String escape(String s)
UnicodeEscaper
If you are escaping input in arbitrary successive chunks, then it is not
generally safe to use this method. If an input string ends with an
unmatched high surrogate character, then this method will throw
IllegalArgumentException
. You should either ensure your input is
valid UTF-16 before
calling this method or use an escaped Appendable
(as returned by
UnicodeEscaper.escape(Appendable)
) which can cope with arbitrarily split input.
Note: When implementing an escaper it is a good idea to override
this method for efficiency by inlining the implementation of
UnicodeEscaper.nextEscapeIndex(CharSequence, int, int)
directly. Doing this for
PercentEscaper
more than doubled the performance for unescaped
strings (as measured by CharEscaperBenchmark).
escape
in interface Escaper
escape
in class UnicodeEscaper
s
- the literal string to be escapedstring
protected char[] escape(int cp)
escape
in class UnicodeEscaper
cp
- the Unicode code point to escape if necessarynull
if no escaping was
neededCopyright © 2025 ChargeBee. All rights reserved.