Package

org.apache.daffodil.processors

charset

Permalink

package charset

Visibility
  1. Public
  2. All

Type Members

  1. trait BitsCharset extends Serializable

    Permalink

    Charset enhanced with features allowing it to work with Daffodil's Bit-wise DataInputStream and DataOutputStream.

    Charset enhanced with features allowing it to work with Daffodil's Bit-wise DataInputStream and DataOutputStream.

    Daffodil uses BitsCharset as its primary abstraction for dealing with character sets, which enables it to support character sets where the code units are smaller than 1 byte.

    Note that BitsCharset is NOT derived from java.nio.charset.Charset, nor are BitsCharsetDecoder or BitsCharsetEncoder derived from java.nio.charset.CharsetDecoder or CharsetEncoder respectively. This is partly because these Java classes have many final methods that make it impossible for us to implement what we need by extending them. But more importantly, we need much more low level control about how characters are decoded what what kind of information is returned during decode operations. Getting that information with the limitations of the java Charset API become an encumbrance. Replacing with our own Charset decoders grealy simplifies the code and allows for future enhancements as needed.

  2. abstract class BitsCharsetDecoder extends AnyRef

    Permalink
  3. abstract class BitsCharsetDecoderByteSize extends BitsCharsetDecoder

    Permalink

    Base class for byte based decoders

    Base class for byte based decoders

    Provides methods to get a single byte. Also handles logic related to error encoding policy and the replacement characters. Implementing class only need to use the provided methods to get a byte(s) and convert to a char and perform validation on the code point.

  4. abstract class BitsCharsetDecoderCreatesSurrogates extends BitsCharsetDecoderByteSize

    Permalink

    Some encodings need state, but only for the storing of a low surrogate pair.

    Some encodings need state, but only for the storing of a low surrogate pair. This encapsulates that logic. When a class extends this class, it ust implement deocodeOneUnicodeChar, which should decode one char, and if there is a high/low surrogate pair it should call setLowSurrgoate on the low and return the high.

  5. class BitsCharsetDecoderIBM037 extends BitsCharsetDecoderByteSize

    Permalink
  6. class BitsCharsetDecoderISO88591 extends BitsCharsetDecoderByteSize

    Permalink
  7. class BitsCharsetDecoderMalformedException extends Exception with ThinThrowable

    Permalink
  8. trait BitsCharsetDecoderState extends AnyRef

    Permalink
  9. class BitsCharsetDecoderUSASCII extends BitsCharsetDecoderByteSize

    Permalink
  10. class BitsCharsetDecoderUTF16BE extends BitsCharsetDecoderCreatesSurrogates

    Permalink
  11. class BitsCharsetDecoderUTF16LE extends BitsCharsetDecoderCreatesSurrogates

    Permalink
  12. class BitsCharsetDecoderUTF32BE extends BitsCharsetDecoderCreatesSurrogates

    Permalink
  13. class BitsCharsetDecoderUTF32LE extends BitsCharsetDecoderCreatesSurrogates

    Permalink
  14. class BitsCharsetDecoderUTF8 extends BitsCharsetDecoderCreatesSurrogates

    Permalink
  15. abstract class BitsCharsetEncoder extends IsResetMixin

    Permalink
  16. trait BitsCharsetJava extends BitsCharset

    Permalink

    Implements BitsCharset based on encapsulation of a regular JavaCharset.

  17. trait BitsCharsetNonByteSize extends BitsCharset

    Permalink

    Some encodings are not byte-oriented.

    Some encodings are not byte-oriented.

    If we know the correspondence from integers to characters, and we can express that as a string, then everything else can be derived

    This class is explicitly not a java.nio.charset.Charset. It is a BitsCharset, which is not a compatible type with a java.nio.charset.Charset on purpose so we don't confuse the two.

    The problem is that java.nio.charset.Charset is designed in such a way that one cannot implement a proxy class that redirects methods to another class. This is due to all the final methods on the class.

    So instead we do the opposite. We implement our own BitsCharset API, but implement the behavior in terms of a proxy JavaCharsetDecoder and proxy JavaCharsetEncoder that drive the decodeLoop and encodeLoop. This way we don't have to re-implement all the error handling and flush/end logic.

  18. final class BitsCharsetNonByteSizeDecoder extends BitsCharsetDecoder

    Permalink
  19. final class BitsCharsetNonByteSizeEncoder extends BitsCharsetEncoder

    Permalink
  20. final class BitsCharsetWrappingJavaCharsetEncoder extends BitsCharsetEncoder

    Permalink

    Implements BitsCharsetEncoder by encapsulating a standard JavaCharsetEncoder

  21. class CharacterSetAlignmentError extends Exception

    Permalink
  22. sealed abstract class CoderInfo extends AnyRef

    Permalink
  23. case class DecoderInfo(coder: BitsCharsetDecoder, encodingMandatoryAlignmentInBitsArg: Int, maybeCharWidthInBitsArg: MaybeInt) extends CoderInfo with Product with Serializable

    Permalink
  24. trait EncoderDecoderMixin extends LocalBufferMixin

    Permalink
  25. case class EncoderInfo(coder: BitsCharsetEncoder, replacingCoder: BitsCharsetEncoder, reportingCoder: BitsCharsetEncoder, encodingMandatoryAlignmentInBitsArg: Int, maybeCharWidthInBitsArg: MaybeInt) extends CoderInfo with Product with Serializable

    Permalink
  26. trait IsResetMixin extends AnyRef

    Permalink
  27. final class ProxyJavaCharsetEncoder extends CharsetEncoder

    Permalink

    Hyjack a JavaCharsetEncoder to drive the encodeLoop.

    Hyjack a JavaCharsetEncoder to drive the encodeLoop.

    This avoids us reimplementing all the error handling and flush/end logic.

    TODO: Similar to our decoders, we should create custom encoders. Then we wouldn't need all this complex code related to proxying java charsets.

    Attributes
    protected

Value Members

  1. object BitsCharset5BitPackedLSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-5-BIT-PACKED-LSBF occupies only 5 bits with each code unit.

  2. object BitsCharset6BitDFI264DUI001 extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-6-BIT-DFI-264-DUI-001, special 6 bit encoding

  3. object BitsCharsetAISPayloadArmoring extends BitsCharsetNonByteSize

    Permalink

    Special purpose.

    Special purpose. This is not used for decoding anything. The encoder is used to convert strings using the characters allowed, into binary data using the AIS Payload Armoring described here:

    http://catb.org/gpsd/AIVDM.html#_aivdm_aivdo_payload_armoring

    To convert a string of length N bytes, You will get 6N bits.

    The decoder can be used for unit testing, but the point of this class is to make the encoder available for use in un-doing the AIS Payload armoring when parsing, and performing this armoring when unparsing.

    When encoding from 8-bit say, ascii, or iso-8859-1, this can only encode things that stay within the 64 allowed characters. dfdl:encodingErrorPolicy='error' would check this (once implemented), otherwise where this is used the checking needs to be done separately somehow.

  4. object BitsCharsetBinaryLSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-BITS-LSBF occupies only 1 bit with each code unit.

  5. object BitsCharsetBinaryMSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-BITS-MSBF occupies only 1 bit with each code unit.

  6. object BitsCharsetHexLSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-HEX-LSBF occupies only 4 bits with each code unit.

  7. object BitsCharsetHexMSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-HEX-MSBF occupies only 4 bits with each code unit.

  8. object BitsCharsetIBM037 extends BitsCharsetJava

    Permalink
  9. object BitsCharsetISO88591 extends BitsCharsetJava

    Permalink
  10. object BitsCharsetOctalLSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-OCTAL-LSBF occupies only 3 bits with each code unit.

  11. object BitsCharsetOctalMSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-OCTAL-MSBF occupies only 3 bits with each code unit.

  12. object BitsCharsetUSASCII extends BitsCharsetJava

    Permalink
  13. object BitsCharsetUSASCII6BitPackedLSBF extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-US-ASCII-6-BIT-PACKED occupies only 6 bits with each code unit.

  14. object BitsCharsetUSASCII6BitPackedMSBF extends BitsCharsetNonByteSize

    Permalink
  15. object BitsCharsetUSASCII7BitPacked extends BitsCharsetNonByteSize

    Permalink

    X-DFDL-US-ASCII-7-BIT-PACKED occupies only 7 bits with each code unit.

  16. object BitsCharsetUTF16BE extends BitsCharsetJava

    Permalink
  17. object BitsCharsetUTF16LE extends BitsCharsetJava

    Permalink
  18. object BitsCharsetUTF32BE extends BitsCharsetJava

    Permalink
  19. object BitsCharsetUTF32LE extends BitsCharsetJava

    Permalink
  20. object BitsCharsetUTF8 extends BitsCharsetJava

    Permalink
  21. object CharsetUtils

    Permalink
  22. object DaffodilCharsetProvider

    Permalink
  23. object StandardBitsCharsets

    Permalink

    Provides BitsCharset objects corresponding to the usual java charsets found in StandardCharsets.

Ungrouped