Class DictionaryValuesWriter
- java.lang.Object
-
- org.apache.parquet.column.values.ValuesWriter
-
- org.apache.parquet.column.values.dictionary.DictionaryValuesWriter
-
- All Implemented Interfaces:
RequiresFallback
- Direct Known Subclasses:
DictionaryValuesWriter.PlainBinaryDictionaryValuesWriter
,DictionaryValuesWriter.PlainDoubleDictionaryValuesWriter
,DictionaryValuesWriter.PlainFloatDictionaryValuesWriter
,DictionaryValuesWriter.PlainIntegerDictionaryValuesWriter
,DictionaryValuesWriter.PlainLongDictionaryValuesWriter
public abstract class DictionaryValuesWriter extends ValuesWriter implements RequiresFallback
Will attempt to encode values using a dictionary and fall back to plain encoding if the dictionary gets too big
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DictionaryValuesWriter.PlainBinaryDictionaryValuesWriter
static class
DictionaryValuesWriter.PlainDoubleDictionaryValuesWriter
static class
DictionaryValuesWriter.PlainFixedLenArrayDictionaryValuesWriter
static class
DictionaryValuesWriter.PlainFloatDictionaryValuesWriter
static class
DictionaryValuesWriter.PlainIntegerDictionaryValuesWriter
static class
DictionaryValuesWriter.PlainLongDictionaryValuesWriter
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.parquet.bytes.ByteBufferAllocator
allocator
protected long
dictionaryByteSize
protected boolean
dictionaryTooBig
protected IntList
encodedValues
protected Encoding
encodingForDictionaryPage
protected boolean
firstPage
indicates if this is the first page being processedprotected int
lastUsedDictionaryByteSize
protected int
lastUsedDictionarySize
protected int
maxDictionaryByteSize
-
Constructor Summary
Constructors Modifier Constructor Description protected
DictionaryValuesWriter(int maxDictionaryByteSize, Encoding encodingForDataPage, Encoding encodingForDictionaryPage, org.apache.parquet.bytes.ByteBufferAllocator allocator)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract void
clearDictionaryContent()
clear/free the underlying dictionary contentvoid
close()
Called to close the values writer.protected DictionaryPage
dictPage(ValuesWriter dictPageWriter)
void
fallBackAllValuesTo(ValuesWriter writer)
When falling back to a different encoding we must re-encode all the values seen so farprotected abstract void
fallBackDictionaryEncodedData(ValuesWriter writer)
long
getAllocatedSize()
long
getBufferedSize()
used to decide if we want to work to the next pageorg.apache.parquet.bytes.BytesInput
getBytes()
protected abstract int
getDictionarySize()
Encoding
getEncoding()
called after getBytes() and before reset()boolean
isCompressionSatisfying(long rawSize, long encodedSize)
Before writing the first page we will verify if the encoding is worth it.String
memUsageString(String prefix)
void
reset()
called after getBytes() to reset the current buffer and start writing the next pagevoid
resetDictionary()
reset the dictionary when a new block startsboolean
shouldFallBack()
In the case of a dictionary based encoding we will fallback if the dictionary becomes too big-
Methods inherited from class org.apache.parquet.column.values.ValuesWriter
toDictPageAndClose, writeBoolean, writeByte, writeBytes, writeDouble, writeFloat, writeInteger, writeLong
-
-
-
-
Field Detail
-
encodingForDictionaryPage
protected final Encoding encodingForDictionaryPage
-
maxDictionaryByteSize
protected final int maxDictionaryByteSize
-
dictionaryTooBig
protected boolean dictionaryTooBig
-
dictionaryByteSize
protected long dictionaryByteSize
-
lastUsedDictionaryByteSize
protected int lastUsedDictionaryByteSize
-
lastUsedDictionarySize
protected int lastUsedDictionarySize
-
encodedValues
protected IntList encodedValues
-
firstPage
protected boolean firstPage
indicates if this is the first page being processed
-
allocator
protected org.apache.parquet.bytes.ByteBufferAllocator allocator
-
-
Method Detail
-
dictPage
protected DictionaryPage dictPage(ValuesWriter dictPageWriter)
-
shouldFallBack
public boolean shouldFallBack()
Description copied from interface:RequiresFallback
In the case of a dictionary based encoding we will fallback if the dictionary becomes too big- Specified by:
shouldFallBack
in interfaceRequiresFallback
- Returns:
- true to notify the parent that we should fallback to another encoding
-
isCompressionSatisfying
public boolean isCompressionSatisfying(long rawSize, long encodedSize)
Description copied from interface:RequiresFallback
Before writing the first page we will verify if the encoding is worth it. and fall back if a simpler encoding would be better in that case- Specified by:
isCompressionSatisfying
in interfaceRequiresFallback
- Parameters:
rawSize
- the size if encoded with plainencodedSize
- the size as encoded by the current encoding- Returns:
- true if we keep this encoding
-
fallBackAllValuesTo
public void fallBackAllValuesTo(ValuesWriter writer)
Description copied from interface:RequiresFallback
When falling back to a different encoding we must re-encode all the values seen so far- Specified by:
fallBackAllValuesTo
in interfaceRequiresFallback
- Parameters:
writer
- the new encoder to write the current values to
-
fallBackDictionaryEncodedData
protected abstract void fallBackDictionaryEncodedData(ValuesWriter writer)
-
getBufferedSize
public long getBufferedSize()
Description copied from class:ValuesWriter
used to decide if we want to work to the next page- Specified by:
getBufferedSize
in classValuesWriter
- Returns:
- the size of the currently buffered data (in bytes)
-
getAllocatedSize
public long getAllocatedSize()
Description copied from class:ValuesWriter
- Specified by:
getAllocatedSize
in classValuesWriter
- Returns:
- the allocated size of the buffer
-
getBytes
public org.apache.parquet.bytes.BytesInput getBytes()
- Specified by:
getBytes
in classValuesWriter
- Returns:
- the bytes buffered so far to write to the current page
-
getEncoding
public Encoding getEncoding()
Description copied from class:ValuesWriter
called after getBytes() and before reset()- Specified by:
getEncoding
in classValuesWriter
- Returns:
- the encoding that was used to encode the bytes
-
reset
public void reset()
Description copied from class:ValuesWriter
called after getBytes() to reset the current buffer and start writing the next page- Specified by:
reset
in classValuesWriter
-
close
public void close()
Description copied from class:ValuesWriter
Called to close the values writer. Any output stream is closed and can no longer be used. All resources are released.- Overrides:
close
in classValuesWriter
-
resetDictionary
public void resetDictionary()
Description copied from class:ValuesWriter
reset the dictionary when a new block starts- Overrides:
resetDictionary
in classValuesWriter
-
clearDictionaryContent
protected abstract void clearDictionaryContent()
clear/free the underlying dictionary content
-
getDictionarySize
protected abstract int getDictionarySize()
- Returns:
- size in items
-
memUsageString
public String memUsageString(String prefix)
- Specified by:
memUsageString
in classValuesWriter
-
-