Class ChunkedOutputStream

  • All Implemented Interfaces:
    Closeable, Flushable, AutoCloseable

    @Beta
    public final class ChunkedOutputStream
    extends OutputStream
    An OutputStream implementation which collects data is a series of byte[] chunks, each of which has a fixed maximum size. This is generally preferable to ByteArrayOutputStream, as that can result in huge byte arrays -- which can create unnecessary pressure on the GC (as well as lot of copying).

    This class takes a different approach: it recognizes that result of buffering will be collected at some point, when the stream is already closed (and thus unmodifiable). Thus it splits the process into two steps:

    • Data acquisition, during which we start with an initial (power-of-two) size and proceed to fill it up. Once the buffer is full, we stash it, allocate a new buffer twice its size and repeat the process. Once we hit maxChunkSize, we do not grow subsequent buffer. We also can skip some intermediate sizes if data is introduced in large chunks via write(byte[], int, int).
    • Buffer consolidation, which occurs when the stream is closed. At this point we construct the final collection of buffers.

    The data acquisition strategy results in predictably-sized buffers, which are growing exponentially in size until they hit maximum size. Intrinsic property here is that the total capacity of chunks created during the ramp up is guaranteed to fit into maxChunkSize, hence they can readily be compacted into a single buffer, which replaces them. Combined with the requirement to trim the last buffer to have accurate length, this algorithm guarantees total number of internal copy operations is capped at 2 * maxChunkSize. The number of produced chunks is also well-controlled:

    • for slowly-built data, we will maintain perfect packing
    • for fast-startup data, we will be at most one one chunk away from packing perfectly
    Author:
    Robert Varga, Tomas Olvecky