Package org.apache.parquet.bytes
Class CapacityByteArrayOutputStream
- java.lang.Object
-
- java.io.OutputStream
-
- org.apache.parquet.bytes.CapacityByteArrayOutputStream
-
- All Implemented Interfaces:
Closeable
,Flushable
,AutoCloseable
public class CapacityByteArrayOutputStream extends OutputStream
Similar to aByteArrayOutputStream
, but uses a different strategy for growing that does not involve copying. Where ByteArrayOutputStream is backed by a single array that "grows" by copying into a new larger array, this output stream grows by allocating a new array (slab) and adding it to a list of previous arrays. Each new slab is allocated to be the same size as all the previous slabs combined, so these allocations become exponentially less frequent, just like ByteArrayOutputStream, with one difference. This output stream accepts a max capacity hint, which is a hint describing the max amount of data that will be written to this stream. As the total size of this stream nears this max, this stream starts to grow linearly instead of exponentially. So new slabs are allocated to be 1/5th of the max capacity hint, instead of equal to the total previous size of all slabs. This is useful because it prevents allocating roughly twice the needed space when a new slab is added just before the stream is done being used. When reusing this stream it will adjust the initial slab size based on the previous data size, aiming for fewer allocations, with the assumption that a similar amount of data will be written to this stream on re-use. See (reset()
).
-
-
Constructor Summary
Constructors Constructor Description CapacityByteArrayOutputStream(int initialSlabSize)
Deprecated.CapacityByteArrayOutputStream(int initialSlabSize, int maxCapacityHint)
Deprecated.CapacityByteArrayOutputStream(int initialSlabSize, int maxCapacityHint, ByteBufferAllocator allocator)
CapacityByteArrayOutputStream(int initialSlabSize, ByteBufferAllocator allocator)
Deprecated.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
int
getCapacity()
long
getCurrentIndex()
static int
initialSlabSizeHeuristic(int minSlabSize, int targetCapacity, int targetNumSlabs)
Return an initial slab size such that a CapacityByteArrayOutputStream constructed with it will end up allocating targetNumSlabs in order to reach targetCapacity.String
memUsageString(String prefix)
void
reset()
When re-using an instance with reset, it will adjust slab size based on previous data size.void
setByte(long index, byte value)
Replace the byte stored at position index in this stream with valuelong
size()
static CapacityByteArrayOutputStream
withTargetNumSlabs(int minSlabSize, int maxCapacityHint, int targetNumSlabs)
static CapacityByteArrayOutputStream
withTargetNumSlabs(int minSlabSize, int maxCapacityHint, int targetNumSlabs, ByteBufferAllocator allocator)
Construct a CapacityByteArrayOutputStream configured such that its initial slab size is determined byinitialSlabSizeHeuristic(int, int, int)
, with targetCapacity == maxCapacityHintvoid
write(byte[] b, int off, int len)
void
write(int b)
void
writeTo(OutputStream out)
Writes the complete contents of this buffer to the specified output stream argument.-
Methods inherited from class java.io.OutputStream
flush, nullOutputStream, write
-
-
-
-
Constructor Detail
-
CapacityByteArrayOutputStream
@Deprecated public CapacityByteArrayOutputStream(int initialSlabSize)
Deprecated.Defaults maxCapacityHint to 1MB- Parameters:
initialSlabSize
- an initial slab size
-
CapacityByteArrayOutputStream
@Deprecated public CapacityByteArrayOutputStream(int initialSlabSize, ByteBufferAllocator allocator)
Deprecated.Defaults maxCapacityHint to 1MB- Parameters:
initialSlabSize
- an initial slab sizeallocator
- an allocator to use when creating byte buffers for slabs
-
CapacityByteArrayOutputStream
@Deprecated public CapacityByteArrayOutputStream(int initialSlabSize, int maxCapacityHint)
Deprecated.- Parameters:
initialSlabSize
- the size to make the first slabmaxCapacityHint
- a hint (not guarantee) of the max amount of data written to this stream
-
CapacityByteArrayOutputStream
public CapacityByteArrayOutputStream(int initialSlabSize, int maxCapacityHint, ByteBufferAllocator allocator)
- Parameters:
initialSlabSize
- the size to make the first slabmaxCapacityHint
- a hint (not guarantee) of the max amount of data written to this streamallocator
- an allocator to use when creating byte buffers for slabs
-
-
Method Detail
-
initialSlabSizeHeuristic
public static int initialSlabSizeHeuristic(int minSlabSize, int targetCapacity, int targetNumSlabs)
Return an initial slab size such that a CapacityByteArrayOutputStream constructed with it will end up allocating targetNumSlabs in order to reach targetCapacity. This aims to be a balance between the overhead of creating new slabs and wasting memory by eagerly making initial slabs too big. Note that targetCapacity here need not match maxCapacityHint in the constructor of CapacityByteArrayOutputStream, though often that would make sense.- Parameters:
minSlabSize
- no matter what we shouldn't make slabs any smaller than thistargetCapacity
- after we've allocated targetNumSlabs how much capacity should we have?targetNumSlabs
- how many slabs should it take to reach targetCapacity?- Returns:
- an initial slab size
-
withTargetNumSlabs
public static CapacityByteArrayOutputStream withTargetNumSlabs(int minSlabSize, int maxCapacityHint, int targetNumSlabs)
-
withTargetNumSlabs
public static CapacityByteArrayOutputStream withTargetNumSlabs(int minSlabSize, int maxCapacityHint, int targetNumSlabs, ByteBufferAllocator allocator)
Construct a CapacityByteArrayOutputStream configured such that its initial slab size is determined byinitialSlabSizeHeuristic(int, int, int)
, with targetCapacity == maxCapacityHint- Parameters:
minSlabSize
- a minimum slab sizemaxCapacityHint
- a hint for the maximum required capacitytargetNumSlabs
- the target number of slabsallocator
- an allocator to use when creating byte buffers for slabs- Returns:
- a capacity baos
-
write
public void write(int b)
- Specified by:
write
in classOutputStream
-
write
public void write(byte[] b, int off, int len)
- Overrides:
write
in classOutputStream
-
writeTo
public void writeTo(OutputStream out) throws IOException
Writes the complete contents of this buffer to the specified output stream argument. the output stream's write methodout.write(slab, 0, slab.length)
) will be called once per slab.- Parameters:
out
- the output stream to which to write the data.- Throws:
IOException
- if an I/O error occurs.
-
size
public long size()
- Returns:
- The total size in bytes of data written to this stream.
-
getCapacity
public int getCapacity()
- Returns:
- The total size in bytes currently allocated for this stream.
-
reset
public void reset()
When re-using an instance with reset, it will adjust slab size based on previous data size. The intent is to reuse the same instance for the same type of data (for example, the same column). The assumption is that the size in the buffer will be consistent.
-
getCurrentIndex
public long getCurrentIndex()
- Returns:
- the index of the last value written to this stream, which
can be passed to
setByte(long, byte)
in order to change it
-
setByte
public void setByte(long index, byte value)
Replace the byte stored at position index in this stream with value- Parameters:
index
- which byte to replacevalue
- the value to replace it with
-
memUsageString
public String memUsageString(String prefix)
- Parameters:
prefix
- a prefix to be used for every new line in the string- Returns:
- a text representation of the memory usage of this structure
-
close
public void close()
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classOutputStream
-
-