Package

com.twitter.scalding.parquet

thrift

Permalink

package thrift

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. thrift
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Type Members

  1. class DailySuffixParquetThrift[T <: ThriftBase] extends DailySuffixSource with ParquetThrift[T]

    Permalink

    When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).

    When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).

    The filter predicate will be pushed down to the input format, potentially making the filter significantly more efficient than a filter applied to a TypedPipe (parquet push-down filters can skip reading entire chunks of data off disk).

    For data with a large schema (many fields / columns), providing the set of columns you intend to use can also make your job significantly more efficient (parquet column projection push-down will skip reading unused columns from disk). The columns are specified in the format described here: https://github.com/apache/parquet-mr/blob/master/parquet_cascading.md#21-projection-pushdown-with-thriftscrooge-records

    These settings are defined in the traits com.twitter.scalding.parquet.HasFilterPredicate and com.twitter.scalding.parquet.HasColumnProjection

    Here are two ways you can use these in a parquet source:

    class MyParquetSource(dr: DateRange) extends DailySuffixParquetThrift("/a/path", dr)
    
    val mySourceFilteredAndProjected = new MyParquetSource(dr) {
      override val withFilter: Option[FilterPredicate] = Some(myFp)
      override val withColumnProjections: Set[String] = Set("a.b.c", "x.y")
    }

    The other way is to add these as constructor arguments:

    class MyParquetSource(
      dr: DateRange,
      override val withFilter: Option[FilterPredicate] = None
      override val withColumnProjections: Set[String] = Set()
    ) extends DailySuffixParquetThrift("/a/path", dr)
    
    val mySourceFilteredAndProjected = new MyParquetSource(dr, Some(myFp), Set("a.b.c", "x.y"))
  2. class FixedPathParquetThrift[T <: ThriftBase] extends FixedPathSource with ParquetThrift[T]

    Permalink
  3. class HourlySuffixParquetThrift[T <: ThriftBase] extends HourlySuffixSource with ParquetThrift[T]

    Permalink
  4. type Parquet346TBaseScheme[T <: TBase[_, _]] = cascading.thrift.Parquet346TBaseScheme[T]

    Permalink
  5. trait ParquetThrift[T <: ThriftBase] extends FileSource with ParquetThriftBase[T]

    Permalink
  6. trait ParquetThriftBase[T] extends FileSource with SingleMappable[T] with TypedSink[T] with LocalTapSource with HasFilterPredicate with HasColumnProjection

    Permalink

Value Members

  1. object ParquetThrift extends Serializable

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped