Helper class for scanning tables stored in Hadoop - e.
Helper class for scanning tables stored in Spark's block manager
LateralViewJoin is used only for LATERAL VIEW explode, which adds a new row per array element in the array to be exploded.
A join operator optimized for joining a large table with a number of small tables that fit in memory.
Cache the RDD and force evaluate it (so the cache is filled).
A base operator class that has many parents and one child.
Helper class for scanning tables stored off-heap.
A data structure used for shuffling data that supports comparison.
A special Spark partitioner that allows hash partitioning of data based on the partitionCode field in ReduceKey.
Converts a collection of rows into key, value pairs.
An operator that runs an external script.
An operator that does projection, i.
SharkExplainTask executes EXPLAIN for RDD operators.
Collect the output as a TableRDD.
A trait for subclasses that handle table scans.
The TableScanOperator is used for scanning any type of Shark or Hive table.
File sink operator.
A base operator class that has at most one parent.
A union operator.
Unlike Hive, group by in Shark is split into two different operators: GroupByPostShuffleOperator and GroupByPreShuffleOperator.
Use Kryo to serialize udtfOp and selOp ObjectInspectors, then convert the Array[Byte] to a String, since XML serialization of Bytes (for @BeanProperty keyword) is inefficient.
Given a Hive plan, OperatorFactory creates the corresponding Shark plan.
A set of RDD-related functions that provide some handy features in addition to Spark's built-in abstractions.