org.apache.spark.sql

GroupedData

class GroupedData extends AnyRef

:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.

Annotations
@Experimental()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. GroupedData
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GroupedData(df: DataFrame, groupingExprs: Seq[Expression])

    Attributes
    protected[org.apache.spark.sql]

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. def agg(expr: Column, exprs: Column*): DataFrame

    Compute aggregates by specifying a series of aggregate columns.

    Compute aggregates by specifying a series of aggregate columns. Unlike other methods in this class, the resulting DataFrame won't automatically include the grouping columns.

    The available aggregate methods are defined in org.apache.spark.sql.functions.

    // Selects the age of the oldest employee and the aggregate expense for each department
    
    // Scala:
    import org.apache.spark.sql.functions._
    df.groupBy("department").agg($"department", max($"age"), sum($"expense"))
    
    // Java:
    import static org.apache.spark.sql.functions.*;
    df.groupBy("department").agg(col("department"), max(col("age")), sum(col("expense")));
    Annotations
    @varargs()
  5. def agg(exprs: Map[String, String]): DataFrame

    (Java-specific) Compute aggregates by specifying a map from column name to aggregate methods.

    (Java-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.

    The available aggregate methods are avg, max, min, sum, count.

    // Selects the age of the oldest employee and the aggregate expense for each department
    import com.google.common.collect.ImmutableMap;
    df.groupBy("department").agg(ImmutableMap.<String, String>builder()
      .put("age", "max")
      .put("expense", "sum")
      .build());
  6. def agg(exprs: Map[String, String]): DataFrame

    (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.

    (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.

    The available aggregate methods are avg, max, min, sum, count.

    // Selects the age of the oldest employee and the aggregate expense for each department
    df.groupBy("department").agg(Map(
      "age" -> "max",
      "expense" -> "sum"
    ))
  7. def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame

    (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.

    (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.

    The available aggregate methods are avg, max, min, sum, count.

    // Selects the age of the oldest employee and the aggregate expense for each department
    df.groupBy("department").agg(
      "age" -> "max",
      "expense" -> "sum"
    )
  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def avg(colNames: String*): DataFrame

    Compute the mean value for each numeric columns for each group.

    Compute the mean value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the mean values for them.

    Annotations
    @varargs()
  10. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. def count(): DataFrame

    Count the number of rows for each group.

    Count the number of rows for each group. The resulting DataFrame will also contain the grouping columns.

  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  16. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  17. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  18. def max(colNames: String*): DataFrame

    Compute the max value for each numeric columns for each group.

    Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.

    Annotations
    @varargs()
  19. def mean(colNames: String*): DataFrame

    Compute the average value for each numeric columns for each group.

    Compute the average value for each numeric columns for each group. This is an alias for avg. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the average values for them.

    Annotations
    @varargs()
  20. def min(colNames: String*): DataFrame

    Compute the min value for each numeric column for each group.

    Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.

    Annotations
    @varargs()
  21. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  22. final def notify(): Unit

    Definition Classes
    AnyRef
  23. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  24. def sum(colNames: String*): DataFrame

    Compute the sum for each numeric columns for each group.

    Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.

    Annotations
    @varargs()
  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  26. def toString(): String

    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped