shark.tgf

TGF

object TGF

This object is responsible for handling TGF (Table Generating Function) commands.

-- TGF Commands --
GENERATE tgfname(param1, param2, ... , param_n)
GENERATE tgfname(param1, param2, ... , param_n) AS tablename

Parameters can either be of primitive types, eg int, or of type RDD[Product]. TGF.execute() will use reflection looking for an object of name "tgfname", invoking apply() with the primitive values. If the type of a parameter to apply() is RDD[Product], it will assume the parameter is the name of a table, which it will turn into an RDD before invoking apply().

For example, "GENERATE MyObj(25, emp)" will invoke MyObj.apply(25, sc.sql2rdd("select * from emp")) , assuming the TGF object (MyObj) has an apply function that takes an int and an RDD[Product].

The "as" version of the command saves the output in a new table named "tablename", whereas the other version returns a ResultSet.

-- Defining TGF objects -- TGF objects need to have an apply() function and take an arbitrary number of either primitive or RDD[Product] typed parameters. The apply() function should either return an RDD[Product] or RDDSchema. When the former case is used, the returned table's schema and column names need to be defined through a Java annotation called @Schema. Here is a short example:

object MyTGF1 {
\@Schema(spec = "name string, age int")
 def apply(table1: RDD[(String, String, Int)]): RDD[Product] = {
   // code that manipulates table1 and returns a new RDD of tuples
 }
}

Sometimes, the TGF dynamically determines the number or types of columns returned. In this case, the TGF can use the RDDSchema return type instead of Java annotations. RDDSchema simply contains a schema string and an RDD of results. For example:

object MyTGF2 {
\@Schema(spec = "name string, age int")
def apply(table1: RDD[(String, String, Int)]): RDD[Product] = {
  // code that manipulates table1 and creates a result rdd
  return RDDSchema(rdd.asInstanceOf[RDD[Seq[_]]], "name string, age int")
}
}

Sometimes the TGF needs to internally make SQL calls. For that, it needs access to a SharkContext object. Therefore,

def apply(sc: SharkContext, table1: RDD[(String, String, Int)]): RDD[Product] = {
// code that can use sc, for example by calling sc.sql2rdd()
// code that manipulates table1 and returns a new RDD of tuples
}
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. TGF
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def execute(sql: String, sc: SharkContext): ResultSet

    Executes a TGF command and gives back the ResultSet.

    Executes a TGF command and gives back the ResultSet. Mainly to be used from SharkContext (e.g. runSql())

    sql

    TGF command, e.g. "GENERATE name(params) AS tablename"

    sc

    SharkContext

    returns

    ResultSet containing the results of the command

  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. final def notify(): Unit

    Definition Classes
    AnyRef
  17. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  18. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  19. def tableRdd(sc: SharkContext, tableName: String): RDD[_]

  20. def toString(): String

    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped