Class OrcColumnVector
- java.lang.Object
-
- org.apache.spark.sql.vectorized.ColumnVector
-
- org.apache.spark.sql.execution.datasources.orc.OrcColumnVector
-
- All Implemented Interfaces:
AutoCloseable
public class OrcColumnVector extends ColumnVector
A column vector class wrapping Hive's ColumnVector. Because Spark ColumnarBatch only accepts Spark's vectorized.ColumnVector, this column vector is used to adapt Hive ColumnVector with Spark ColumnarVector.
-
-
Field Summary
-
Fields inherited from class org.apache.spark.sql.vectorized.ColumnVector
type
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Cleans up memory for this column vector.ColumnarArray
getArray(int rowId)
Returns the array type value for rowId.byte[]
getBinary(int rowId)
Returns the binary type value for rowId.boolean
getBoolean(int rowId)
Returns the boolean type value for rowId.byte
getByte(int rowId)
Returns the byte type value for rowId.ColumnVector
getChild(int ordinal)
org.apache.spark.sql.types.Decimal
getDecimal(int rowId, int precision, int scale)
Returns the decimal type value for rowId.double
getDouble(int rowId)
Returns the double type value for rowId.float
getFloat(int rowId)
Returns the float type value for rowId.int
getInt(int rowId)
Returns the int type value for rowId.long
getLong(int rowId)
Returns the long type value for rowId.ColumnarMap
getMap(int rowId)
Returns the map type value for rowId.short
getShort(int rowId)
Returns the short type value for rowId.org.apache.spark.unsafe.types.UTF8String
getUTF8String(int rowId)
Returns the string type value for rowId.boolean
hasNull()
Returns true if this column vector contains any null values.boolean
isNullAt(int rowId)
Returns whether the value at rowId is NULL.int
numNulls()
Returns the number of nulls in this column vector.void
setBatchSize(int batchSize)
-
Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector
dataType, getBooleans, getBytes, getDoubles, getFloats, getInterval, getInts, getLongs, getShorts, getStruct
-
-
-
-
Method Detail
-
setBatchSize
public void setBatchSize(int batchSize)
-
close
public void close()
Description copied from class:ColumnVector
Cleans up memory for this column vector. The column vector is not usable after this. This overwrites `AutoCloseable.close` to remove the `throws` clause, as column vector is in-memory and we don't expect any exception to happen during closing.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in classColumnVector
-
hasNull
public boolean hasNull()
Description copied from class:ColumnVector
Returns true if this column vector contains any null values.- Specified by:
hasNull
in classColumnVector
-
numNulls
public int numNulls()
Description copied from class:ColumnVector
Returns the number of nulls in this column vector.- Specified by:
numNulls
in classColumnVector
-
isNullAt
public boolean isNullAt(int rowId)
Description copied from class:ColumnVector
Returns whether the value at rowId is NULL.- Specified by:
isNullAt
in classColumnVector
-
getBoolean
public boolean getBoolean(int rowId)
Description copied from class:ColumnVector
Returns the boolean type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getBoolean
in classColumnVector
-
getByte
public byte getByte(int rowId)
Description copied from class:ColumnVector
Returns the byte type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getByte
in classColumnVector
-
getShort
public short getShort(int rowId)
Description copied from class:ColumnVector
Returns the short type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getShort
in classColumnVector
-
getInt
public int getInt(int rowId)
Description copied from class:ColumnVector
Returns the int type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getInt
in classColumnVector
-
getLong
public long getLong(int rowId)
Description copied from class:ColumnVector
Returns the long type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getLong
in classColumnVector
-
getFloat
public float getFloat(int rowId)
Description copied from class:ColumnVector
Returns the float type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getFloat
in classColumnVector
-
getDouble
public double getDouble(int rowId)
Description copied from class:ColumnVector
Returns the double type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.- Specified by:
getDouble
in classColumnVector
-
getDecimal
public org.apache.spark.sql.types.Decimal getDecimal(int rowId, int precision, int scale)
Description copied from class:ColumnVector
Returns the decimal type value for rowId. If the slot for rowId is null, it should return null.- Specified by:
getDecimal
in classColumnVector
-
getUTF8String
public org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId)
Description copied from class:ColumnVector
Returns the string type value for rowId. If the slot for rowId is null, it should return null. Note that the returned UTF8String may point to the data of this column vector, please copy it if you want to keep it after this column vector is freed.- Specified by:
getUTF8String
in classColumnVector
-
getBinary
public byte[] getBinary(int rowId)
Description copied from class:ColumnVector
Returns the binary type value for rowId. If the slot for rowId is null, it should return null.- Specified by:
getBinary
in classColumnVector
-
getArray
public ColumnarArray getArray(int rowId)
Description copied from class:ColumnVector
Returns the array type value for rowId. If the slot for rowId is null, it should return null. To support array type, implementations must construct anColumnarArray
and return it in this method.ColumnarArray
requires aColumnVector
that stores the data of all the elements of all the arrays in this vector, and an offset and length which points to a range in thatColumnVector
, and the range represents the array for rowId. Implementations are free to decide where to put the data vector and offsets and lengths. For example, we can use the first child vector as the data vector, and store offsets and lengths in 2 int arrays in this vector.- Specified by:
getArray
in classColumnVector
-
getMap
public ColumnarMap getMap(int rowId)
Description copied from class:ColumnVector
Returns the map type value for rowId. If the slot for rowId is null, it should return null. In Spark, map type value is basically a key data array and a value data array. A key from the key array with a index and a value from the value array with the same index contribute to an entry of this map type value. To support map type, implementations must construct aColumnarMap
and return it in this method.ColumnarMap
requires aColumnVector
that stores the data of all the keys of all the maps in this vector, and anotherColumnVector
that stores the data of all the values of all the maps in this vector, and a pair of offset and length which specify the range of the key/value array that belongs to the map type value at rowId.- Specified by:
getMap
in classColumnVector
-
getChild
public ColumnVector getChild(int ordinal)
- Specified by:
getChild
in classColumnVector
- Returns:
- child [[ColumnVector]] at the given ordinal.
-
-