This class acts as a DataSource provider for column format tables provided Snappy.
Base trait for iterators that are capable of reading and returning the entire set of columns of a column batch.
Base trait for iterators that are capable of reading and returning the entire set of columns of a column batch. These can be local region iterators or those fetching entries from remote nodes.
A RowEncoder implementation for ColumnFormatValue and child classes.
A customized iterator for column store tables that projects out the required columns and returns those column batches first that have all their columns in the memory.
A customized iterator for column store tables that projects out the required columns and returns those column batches first that have all their columns in the memory. Further this will make use of DiskBlockSortManager to allow for concurrent partition iterators to do cross-partition disk block sorting and fault-in for best disk read performance (SNAP-2012).
Result of compaction of a column batch added to transaction pre-commit results.
Result of compaction of a column batch added to transaction pre-commit results.
NOTE: if the layout of this class or ColumnFormatKey changes, then update the regex pattern in SnapshotConnectionListener.parseCompactionResult that parses the toString() of this class
Column tables don't support any extensions over regular Spark schema syntax, but the support for ExternalSchemaRelationProvider has been added as a workaround to allow for specifying schema in a CREATE TABLE AS SELECT statement.
Column tables don't support any extensions over regular Spark schema syntax, but the support for ExternalSchemaRelationProvider has been added as a workaround to allow for specifying schema in a CREATE TABLE AS SELECT statement.
Normally Spark does not allow specifying schema in a CTAS statement for DataSources (except its special "hive" provider), so schema is passed here as string which is parsed locally in the CreatableRelationProvider implementation.
Currently this is same as ColumnFormatRelation but has kept it as a separate class to allow adding of any index specific functionality in future.
Column Store implementation for GemFireXD.
A ClusteredColumnIterator that fetches entries from a remote bucket.
A ClusteredColumnIterator that fetches entries from a remote bucket.
TODO: PERF: instead of fetching using getAll, this should instead open a named ColumnFormatIterator on the remote node hosting the bucket, then step through the iterator to fetch batch (or batches) at a time using Function/GfxdFunctionMessage invocations. As of now, the getAll invocation does not honour ordered disk reads, proper fault-in etc.
Provides a ColumnBatchIterator over a single column batch for ColumnTableScan.
The type of the generated class used by column stats check for a column batch.
Compact column batches, if required, and insert new compacted column batches, or if they are too small then push into row delta buffer.
This class acts as a DataSource provider for column format tables provided Snappy. It uses GemFireXD as actual datastore to physically locate the tables. Column tables can be used for storing data in columnar compressed format. A example usage is given below.
val data = Seq(Data(1, 2, 3), Data(7, 8, 9), Data(9, 2, 3), Data(4, 2, 3), Data(5, 6, 7)) val dataDF = snc.createDataset(data)(Encoders.product) snc.createTable(tableName, "column", dataDF.schema, props) dataDF.write.insertInto(tableName)
This provider scans underlying tables in parallel and is aware of the data partition. It does not introduces a shuffle if simple table query is fired. One can insert a single or multiple rows into this table as well as do a bulk insert by a Spark DataFrame. Bulk insert example is shown above.