thrift

Type Members

class DailySuffixParquetThrift[T <: ThriftBase] extends DailySuffixSource with ParquetThrift[T]

When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).
When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).
The filter predicate will be pushed down to the input format, potentially making the filter significantly more efficient than a filter applied to a TypedPipe (parquet push-down filters can skip reading entire chunks of data off disk).
For data with a large schema (many fields / columns), providing the set of columns you intend to use can also make your job significantly more efficient (parquet column projection push-down will skip reading unused columns from disk). The columns are specified in the format described here: https://github.com/apache/parquet-mr/blob/master/parquet_cascading.md#21-projection-pushdown-with-thriftscrooge-records
These settings are defined in the traits com.twitter.scalding.parquet.HasFilterPredicate and com.twitter.scalding.parquet.HasColumnProjection
Here are two ways you can use these in a parquet source:
```
class MyParquetSource(dr: DateRange) extends DailySuffixParquetThrift("/a/path", dr)

val mySourceFilteredAndProjected = new MyParquetSource(dr) {
  override val withFilter: Option[FilterPredicate] = Some(myFp)
  override val withColumnProjections: Set[String] = Set("a.b.c", "x.y")
}
```
The other way is to add these as constructor arguments:
```
class MyParquetSource(
  dr: DateRange,
  override val withFilter: Option[FilterPredicate] = None
  override val withColumnProjections: Set[String] = Set()
) extends DailySuffixParquetThrift("/a/path", dr)

val mySourceFilteredAndProjected = new MyParquetSource(dr, Some(myFp), Set("a.b.c", "x.y"))
```
class FixedPathParquetThrift[T <: ThriftBase] extends FixedPathSource with ParquetThrift[T]
class HourlySuffixParquetThrift[T <: ThriftBase] extends HourlySuffixSource with ParquetThrift[T]
type Parquet346TBaseScheme[T <: TBase[_, _]] = cascading.thrift.Parquet346TBaseScheme[T]
trait ParquetThrift[T <: ThriftBase] extends FileSource with ParquetThriftBase[T]
trait ParquetThriftBase[T] extends FileSource with SingleMappable[T] with TypedSink[T] with LocalTapSource with HasFilterPredicate with HasColumnProjection

package thrift

Type Members

class DailySuffixParquetThrift[T <: ThriftBase] extends DailySuffixSource with ParquetThrift[T]

class FixedPathParquetThrift[T <: ThriftBase] extends FixedPathSource with ParquetThrift[T]

class HourlySuffixParquetThrift[T <: ThriftBase] extends HourlySuffixSource with ParquetThrift[T]

type Parquet346TBaseScheme[T <: TBase[_, _]] = cascading.thrift.Parquet346TBaseScheme[T]

trait ParquetThrift[T <: ThriftBase] extends FileSource with ParquetThriftBase[T]

trait ParquetThriftBase[T] extends FileSource with SingleMappable[T] with TypedSink[T] with LocalTapSource with HasFilterPredicate with HasColumnProjection

Value Members

object ParquetThrift extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped