object
TGF
Value Members
-
final
def
!=(arg0: AnyRef): Boolean
-
final
def
!=(arg0: Any): Boolean
-
final
def
##(): Int
-
final
def
==(arg0: AnyRef): Boolean
-
final
def
==(arg0: Any): Boolean
-
final
def
asInstanceOf[T0]: T0
-
def
clone(): AnyRef
-
final
def
eq(arg0: AnyRef): Boolean
-
def
equals(arg0: Any): Boolean
-
-
def
finalize(): Unit
-
final
def
getClass(): Class[_]
-
def
hashCode(): Int
-
final
def
isInstanceOf[T0]: Boolean
-
final
def
ne(arg0: AnyRef): Boolean
-
final
def
notify(): Unit
-
final
def
notifyAll(): Unit
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
-
def
tableRdd(sc: SharkContext, tableName: String): RDD[_]
-
def
toString(): String
-
final
def
wait(): Unit
-
final
def
wait(arg0: Long, arg1: Int): Unit
-
final
def
wait(arg0: Long): Unit
This object is responsible for handling TGF (Table Generating Function) commands.
Parameters can either be of primitive types, eg int, or of type RDD[Product]. TGF.execute() will use reflection looking for an object of name "tgfname", invoking apply() with the primitive values. If the type of a parameter to apply() is RDD[Product], it will assume the parameter is the name of a table, which it will turn into an RDD before invoking apply().
For example, "GENERATE MyObj(25, emp)" will invoke MyObj.apply(25, sc.sql2rdd("select * from emp")) , assuming the TGF object (MyObj) has an apply function that takes an int and an RDD[Product].
The "as" version of the command saves the output in a new table named "tablename", whereas the other version returns a ResultSet.
-- Defining TGF objects -- TGF objects need to have an apply() function and take an arbitrary number of either primitive or RDD[Product] typed parameters. The apply() function should either return an RDD[Product] or RDDSchema. When the former case is used, the returned table's schema and column names need to be defined through a Java annotation called @Schema. Here is a short example:
Sometimes, the TGF dynamically determines the number or types of columns returned. In this case, the TGF can use the RDDSchema return type instead of Java annotations. RDDSchema simply contains a schema string and an RDD of results. For example:
Sometimes the TGF needs to internally make SQL calls. For that, it needs access to a SharkContext object. Therefore,