com.twitter.scalding.parquet

thrift

package thrift

Visibility
  1. Public
  2. All

Type Members

  1. class DailySuffixParquetThrift[T <: ThriftBase] extends DailySuffixSource with ParquetThrift[T]

    When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).

    When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).

    The filter predicate will be pushed down to the input format, potentially making the filter significantly more efficient than a filter applied to a TypedPipe (parquet push-down filters can skip reading entire chunks of data off disk).

    For data with a large schema (many fields / columns), providing the set of columns you intend to use can also make your job significantly more efficient (parquet column projection push-down will skip reading unused columns from disk). The columns are specified in the format described here: https://github.com/apache/incubator-parquet-mr/blob/master/parquet_cascading.md#21-projection-pushdown-with-thriftscrooge-records

    These settings are defined in the traits com.twitter.scalding.parquet.HasFilterPredicate and com.twitter.scalding.parquet.HasColumnProjection

    Here are two ways you can use these in a parquet source:

    class MyParquetSource(dr: DateRange) extends DailySuffixParquetThrift("/a/path", dr)
    
    val mySourceFilteredAndProjected = new MyParquetSource(dr) {
      override val withFilter: Option[FilterPredicate] = Some(myFp)
      override val withColumns: Set[String] = Set("a/b/c", "x/y")
    }

    The other way is to add these as constructor arguments:

    class MyParquetSource(
      dr: DateRange,
      override val withFilter: Option[FilterPredicate] = None
      override val withColumns: Set[String] = Set()
    ) extends DailySuffixParquetThrift("/a/path", dr)
    
    val mySourceFilteredAndProjected = new MyParquetSource(dr, Some(myFp), Set("a/b/c", "x/y"))
  2. class FixedPathParquetThrift[T <: ThriftBase] extends FixedPathSource with ParquetThrift[T]

  3. class HourlySuffixParquetThrift[T <: ThriftBase] extends HourlySuffixSource with ParquetThrift[T]

  4. trait ParquetThrift[T <: ThriftBase] extends FileSource with ParquetThriftBase[T]

  5. trait ParquetThriftBase[T] extends FileSource with SingleMappable[T] with TypedSink[T] with LocalTapSource with HasFilterPredicate with HasColumnProjection

Value Members

  1. object ParquetThrift extends Serializable

Ungrouped