|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
@ComponentSpecification public interface EncodingUtil
This is the interface for a collection of utility functions to that help deal
with encodings. An encoding defines a mapping of Character
s
of a Charset
to Byte
s and vice versa.
EncodingUtilImpl
Field Summary | |
---|---|
static String |
ENCODING_CP_437
The encoding CP437 also called DOS-US . |
static String |
ENCODING_CP_737
The encoding CP737 . |
static String |
ENCODING_CP_850
The encoding CP850 . |
static String |
ENCODING_CP_852
The encoding CP852 . |
static String |
ENCODING_CP_855
The encoding CP855 . |
static String |
ENCODING_CP_857
The encoding CP857 . |
static String |
ENCODING_CP_858
The encoding CP857 . |
static String |
ENCODING_CP_860
The encoding CP860 . |
static String |
ENCODING_CP_861
The encoding CP861 . |
static String |
ENCODING_CP_863
The encoding CP863 . |
static String |
ENCODING_CP_865
The encoding CP865 . |
static String |
ENCODING_CP_866
The encoding CP866 . |
static String |
ENCODING_CP_869
The encoding CP869 . |
static String |
ENCODING_ISO_8859_1
The encoding ISO-8859-1 also called Latin-1 . |
static String |
ENCODING_ISO_8859_10
The encoding ISO-8859-10 also called Latin-6 . |
static String |
ENCODING_ISO_8859_11
The encoding ISO-8859-11 . |
static String |
ENCODING_ISO_8859_12
Deprecated. |
static String |
ENCODING_ISO_8859_13
The encoding ISO-8859-13 also called Latin-7 . |
static String |
ENCODING_ISO_8859_14
The encoding ISO-8859-14 also called Latin-8 . |
static String |
ENCODING_ISO_8859_15
The encoding ISO-8859-15 also called Latin-9 . |
static String |
ENCODING_ISO_8859_16
The encoding ISO-8859-16 also called Latin-10 . |
static String |
ENCODING_ISO_8859_2
The encoding ISO-8859-2 also called Latin-2 . |
static String |
ENCODING_ISO_8859_3
The encoding ISO-8859-3 also called Latin-3 . |
static String |
ENCODING_ISO_8859_4
The encoding ISO-8859-4 also called Latin-4 . |
static String |
ENCODING_ISO_8859_5
The encoding ISO-8859-5 . |
static String |
ENCODING_ISO_8859_6
The encoding ISO-8859-6 . |
static String |
ENCODING_ISO_8859_7
The encoding ISO-8859-7 . |
static String |
ENCODING_ISO_8859_8
The encoding ISO-8859-8 . |
static String |
ENCODING_ISO_8859_9
The encoding ISO-8859-9 also called Latin-5 . |
static String |
ENCODING_KOI8_R
The encoding KOI8-R . |
static String |
ENCODING_KOI8_U
The encoding KOI8-U . |
static String |
ENCODING_US_ASCII
The encoding US-ASCII (American Standard Code for Information
Interchange) also just called ASCII . |
static String |
ENCODING_UTF_16
The encoding UTF-16 . |
static String |
ENCODING_UTF_16_BE
The encoding UTF-16, big-endian . |
static String |
ENCODING_UTF_16_LE
The encoding UTF-16, little-endian . |
static String |
ENCODING_UTF_32
The encoding UTF-32 . |
static String |
ENCODING_UTF_32_BE
The encoding UTF-32, big-endian . |
static String |
ENCODING_UTF_32_LE
The encoding UTF-32, little-endian . |
static String |
ENCODING_UTF_8
The encoding UTF-8 . |
static String |
ENCODING_WINDOWS_1250
The encoding CP1250 also called Windows-1250 . |
static String |
ENCODING_WINDOWS_1251
The encoding CP1251 also called Windows-1251 . |
static String |
ENCODING_WINDOWS_1252
The encoding CP1252 also called Windows-1252 . |
static String |
ENCODING_WINDOWS_1253
The encoding CP1253 also called Windows-1253 . |
static String |
ENCODING_WINDOWS_1254
The encoding CP1254 also called Windows-1254 . |
static String |
ENCODING_WINDOWS_1255
The encoding CP1255 also called Windows-1255 . |
static String |
ENCODING_WINDOWS_1256
The encoding CP1256 also called Windows-1256 . |
static String |
ENCODING_WINDOWS_1257
The encoding CP1257 also called Windows-1257 . |
static String |
ENCODING_WINDOWS_1258
The encoding CP1258 also called Windows-1258 . |
static String |
SYSTEM_DEFAULT_ENCODING
The default encoding used by this JVM as fallback if no explicit encoding is specified. |
Method Summary | |
---|---|
EncodingDetectionReader |
createUtfDetectionReader(InputStream inputStream,
String nonUtfEncoding)
This method creates a new Reader for the given
inputStream . |
Field Detail |
---|
static final String SYSTEM_DEFAULT_ENCODING
static final String ENCODING_US_ASCII
US-ASCII
(American Standard Code for Information
Interchange) also just called ASCII
.lib/rt.jar
.
static final String ENCODING_UTF_8
UTF-8
. It is an 8-bit Unicode Transformation
Format.lib/rt.jar
.
static final String ENCODING_UTF_16
UTF-16
. It is an 16-bit Unicode Transformation
Format. The byte-order is determined by an optional ByteOrderMark
.lib/rt.jar
.
static final String ENCODING_UTF_16_LE
UTF-16, little-endian
. It is an 16-bit Unicode
Transformation Format.lib/rt.jar
.
static final String ENCODING_UTF_16_BE
UTF-16, big-endian
. It is an 16-bit Unicode
Transformation Format.lib/rt.jar
.
static final String ENCODING_UTF_32
UTF-32
. It is an 32-bit Unicode Transformation
Format. The byte-order is determined by an optional ByteOrderMark
.
static final String ENCODING_UTF_32_LE
UTF-32, little-endian
. It is an 32-bit Unicode
Transformation Format.
static final String ENCODING_UTF_32_BE
UTF-32, big-endian
. It is an 32-bit Unicode
Transformation Format.
static final String ENCODING_ISO_8859_1
ISO-8859-1
also called Latin-1
. It
is covering most Western European languages.lib/rt.jar
.
static final String ENCODING_ISO_8859_2
ISO-8859-2
also called Latin-2
. It
is covering the Central and Eastern European languages that use the Latin
alphabet.lib/rt.jar
.
static final String ENCODING_ISO_8859_3
ISO-8859-3
also called Latin-3
. It
is covering the South European languages.lib/charsets.jar
.
static final String ENCODING_ISO_8859_4
ISO-8859-4
also called Latin-4
. It
is covering the North European languages.lib/rt.jar
.
static final String ENCODING_ISO_8859_5
ISO-8859-5
. It is covering mostly Slavic
languages that use a Cyrillic alphabet.lib/rt.jar
.
static final String ENCODING_ISO_8859_6
ISO-8859-6
. It is covering common Arabic language
characters.lib/charsets.jar
.
static final String ENCODING_ISO_8859_7
ISO-8859-7
. It is covering modern Greek.lib/rt.jar
.
static final String ENCODING_ISO_8859_8
ISO-8859-8
. It is covering modern Hebrew (used in
Israel).lib/charsets.jar
.
static final String ENCODING_ISO_8859_9
ISO-8859-9
also called Latin-5
. It
is covering Turkish and Kurdish.lib/rt.jar
.
static final String ENCODING_ISO_8859_10
ISO-8859-10
also called Latin-6
. It
is used for Nordic languages.
static final String ENCODING_ISO_8859_11
ISO-8859-11
. The
canonical name
however is
x-iso-8859-11
. It is covering common Thai language characters.
@Deprecated static final String ENCODING_ISO_8859_12
ISO-8859-12
. The work on this encoding for
Devanagari was stopped so it does NOT exist at all.
static final String ENCODING_ISO_8859_13
ISO-8859-13
also called Latin-7
. It
is covering Baltic languages.lib/rt.jar
.
static final String ENCODING_ISO_8859_14
ISO-8859-14
also called Latin-8
. It
is covering Celtic languages.
static final String ENCODING_ISO_8859_15
ISO-8859-15
also called Latin-9
. It
is very similar to Latin-1
but adds the
euro-sign and 7 other characters by replacing rarely used ones.lib/rt.jar
.
static final String ENCODING_ISO_8859_16
ISO-8859-16
also called Latin-10
. It
is covering South-Eastern European languages and includes the euro-sign.
static final String ENCODING_KOI8_R
KOI8-R
. It is covering Russian and Bulgarian. It
is therefore related to ENCODING_ISO_8859_5
and
ENCODING_WINDOWS_1251
.lib/rt.jar
.
static final String ENCODING_KOI8_U
KOI8-U
. It is covering Ukrainian. It is related
to ENCODING_KOI8_R
, ENCODING_ISO_8859_5
and
ENCODING_WINDOWS_1251
.
static final String ENCODING_CP_437
CP437
also called DOS-US
. It is used
by MS-DOS and is based on ENCODING_US_ASCII
but NOT completely
compatible.
static final String ENCODING_CP_737
CP737
. It is used by MS-DOS for Greek and is
therefore related to ENCODING_CP_869
and
ENCODING_ISO_8859_7
.
static final String ENCODING_CP_850
CP850
. It is used by MS-DOS for Western European
languages and is therefore related to ENCODING_ISO_8859_1
.
static final String ENCODING_CP_852
CP852
. It is used by MS-DOS for Central European
languages and is therefore related to ENCODING_ISO_8859_2
.
static final String ENCODING_CP_855
CP855
. It is used by MS-DOS for Cyrillic letters
and is therefore related to ENCODING_ISO_8859_5
.
static final String ENCODING_CP_857
CP857
. It is used by MS-DOS for Turkish and is
therefore related to ENCODING_ISO_8859_9
.
static final String ENCODING_CP_858
CP857
. It is used by MS-DOS for Western European
languages and is like ENCODING_CP_850
but replaces one character
with the euro-sign. It is therefore related to
ENCODING_ISO_8859_15
.
static final String ENCODING_CP_860
CP860
. It is used by MS-DOS for Portuguese and is
therefore related to ENCODING_ISO_8859_1
.
static final String ENCODING_CP_861
CP861
. It is used by MS-DOS for Nordic languages
especially for Icelandic and is therefore related to
ENCODING_ISO_8859_10
.
static final String ENCODING_CP_863
CP863
. It is used by MS-DOS for French and is
therefore related to ENCODING_ISO_8859_15
.
static final String ENCODING_CP_865
CP865
. It is used by MS-DOS for Nordic languages
except Icelandic for which ENCODING_CP_861
is used. It is
therefore related to ENCODING_ISO_8859_10
.
static final String ENCODING_CP_866
CP866
. It is used by MS-DOS for Cyrillic letters
and is therefore related to ENCODING_CP_855
and
ENCODING_ISO_8859_5
.
static final String ENCODING_CP_869
CP869
. It is used by MS-DOS for Greek and is
therefore related to ENCODING_CP_737
and
ENCODING_ISO_8859_7
.
static final String ENCODING_WINDOWS_1250
CP1250
also called Windows-1250
. It
is used by Microsoft Windows for Central European languages and is similar
to ENCODING_ISO_8859_2
.lib/rt.jar
.
static final String ENCODING_WINDOWS_1251
CP1251
also called Windows-1251
. It
is used by Microsoft Windows for Cyrillic letters and is similar to
ENCODING_ISO_8859_5
.lib/rt.jar
.
static final String ENCODING_WINDOWS_1252
CP1252
also called Windows-1252
. It
is used by Microsoft Windows for Western European languages and is similar
to ENCODING_ISO_8859_1
.lib/rt.jar
.
static final String ENCODING_WINDOWS_1253
CP1253
also called Windows-1253
. It
is used by Microsoft Windows for Greek and is similar to
ENCODING_ISO_8859_7
.lib/rt.jar
.
static final String ENCODING_WINDOWS_1254
CP1254
also called Windows-1254
. It
is used by Microsoft Windows for Turkish and is similar to
ENCODING_ISO_8859_9
.lib/rt.jar
.
static final String ENCODING_WINDOWS_1255
CP1255
also called Windows-1255
. It
is used by Microsoft Windows for Hebrew and is similar to
ENCODING_ISO_8859_8
.
static final String ENCODING_WINDOWS_1256
CP1256
also called Windows-1256
. It
is used by Microsoft Windows for Arabic and is similar to
ENCODING_ISO_8859_6
.
static final String ENCODING_WINDOWS_1257
CP1257
also called Windows-1257
. It
is used by Microsoft Windows for Baltic languages and is similar to
ENCODING_ISO_8859_13
.lib/rt.jar
.
static final String ENCODING_WINDOWS_1258
CP1258
also called Windows-1258
. It
is used by Microsoft Windows for Vietnamese and is similar to
ENCODING_WINDOWS_1252
.
Method Detail |
---|
EncodingDetectionReader createUtfDetectionReader(InputStream inputStream, String nonUtfEncoding)
Reader
for the given
inputStream
. The EncodingDetectionReader
automatically
detects UTF (Unicode Transformation Format) encodings. If the data provided
by inputStream
is NOT in such encoding, it will use the given
nonUtfEncoding
as fallback.EncodingDetectionReader
will behave like
InputStreamReader
but with an encoding that is
automatically detected whilst reading. It will use a lookahead buffer to
detect the encoding. As long as no UTF characteristic was detected and only
ASCII-characters (<128
) are hit, the encoding remains
ENCODING_US_ASCII
. As soon as an UTF sequence was detected (e.g.
ENCODING_UTF_8
or ENCODING_UTF_16_BE
), the encoding
switches to that encoding. If a non-ASCII character is hit and no UTF
encoding is detected, the EncodingDetectionReader
switches to the
given nonUtfEncoding
.
inputStream
- is the InputStream
to decode and read.nonUtfEncoding
- is the encoding to use in case the data is NOT
encoded in UTF (e.g. ENCODING_ISO_8859_15
). It is pointless
to use an UTF-based encoding or ENCODING_US_ASCII
here.
EncodingDetectionReader
that can be used to read the
inputStream
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |