shortcut name for command-line arguments, after extraction of the hadoop and scoobi ones
make a DList runnable, executing the computation and returning the values
make a DList runnable, executing the computation and returning the values
make a DObject runnable, executing the computation and returning the values
make a DObject runnable, executing the computation and returning the values
the categories to show when logging, as a regular expression
the classes directories to include on a job classpath
store the value of the configuration in a lazy val, so that it can be updated and still be referenced
store the value of the configuration in a lazy val, so that it can be updated and still be referenced
set command-line arguments on the configuration object
set command-line arguments on the configuration object
a configuration with cluster setup
a configuration with memory setup
a configuration with local setup
a configuration where the appropriate properties are set-up for uploaded jars: distributed files + classpath
delete the remote jars currently on the cluster
true if the libjars must be deleted before the Scoobi job runs
a function to display execution times. The default uses log messages
execute some code locally
execute some code locally
execute some code on the cluster, setting the filesystem / jobtracker addresses and setting up the classpath
execute some code on the cluster, setting the filesystem / jobtracker addresses and setting up the classpath
execute some code locally
execute some code locally
the filesystem address
execute some code in memory, using a collection backend, possibly showing execution times
execute some code in memory, using a collection backend, possibly showing execution times
true if you want to include the library jars in the jar that is sent to the cluster for each job
true if the cluster argument is specified and the local argument is not
alias for locally
alias for locally
the list of library jars to upload
the jobtracker address
false if temporary files and working directory must be cleaned-up after job execution
the log level to use when logging
the path of the directory to use when loading jars to the filesystem.
the execution is local if the file system is local, as determined by the configuration files loaded by the hadoop script or if "local" is passed on the command line.
the execution is local if the file system is local, as determined by the configuration files loaded by the hadoop script or if "local" is passed on the command line.
if locally returns true then we might attempt to upload the dependent jars to the cluster and to add them to the classpath
parse the command-line argument and:
parse the command-line argument and:
true if the main jar contains all the dependencies for this application by default this is delegated to the Classes trait which looks for the presence of a scoobi_* jar or for com/nicta/scoobi jar entries in the main jar
false if libjars are used
execute some code on the cluster, possibly showing the execution time
execute some code on the cluster, possibly showing the execution time
execute some code, either locally or on the cluster, depending on the local argument being passed on the commandline
execute some code, either locally or on the cluster, depending on the local argument being passed on the commandline
execute some code locally, possibly showing execution times
execute some code locally, possibly showing execution times
Persisting
Persisting
allow to call list.persist
allow to call list.persist
allow to call object.persist
allow to call object.persist
true to suppress log messages
run a list.
run a list.
This is equivalent to:
val obj = list.materialise
run(obj)
the result of the in-memory run
the cluster evaluation of t
the result of the local run
this provides the arguments which are parsed to change the behavior of the Scoobi app: logging, local/cluster,.
this provides the arguments which are parsed to change the behavior of the Scoobi app: logging, local/cluster,...
ScoobiUserArgs
set the default configuration, depending on the arguments
set the default configuration, depending on the arguments
Static setup to use a testing log factory
Static setup to use a testing log factory
true if the debug logs must show the computation graph
measure the time taken by some executed code and display the time with a specific display function
measure the time taken by some executed code and display the time with a specific display function
true to display execution times for each job
upload the jars unless 'nolibjars' has been set on the command-line'
upload the jars which don't exist yet in the library directory on the cluster
upload the jars which don't exist yet in the library directory on the cluster
the remote jars currently on the cluster
true if cluster configuration must be loaded from Hadoop's configuration directory
the time for the execution of a piece of code
This trait can be extended to create an application running Scoobi code.
Command-line arguments are available in the args attribute (minus the hadoop specific ones) and a default implicit ScoobiConfiguration is also accessible to create DLists.
A ScoobiApp will be used in 2 different contexts:
In that case you will use hadoop default configuration files or you will need to tell this script where to find the configuration files.
2. within sbt
In that case the cluster location can be either defined by:
Then, if it can be determined that the execution will not be local but on the cluster (@see locally), the ScoobiApp trait will attempt to load the dependent jars to the libjars directory on the cluster (if not already there, @see LibJars for the details). This behavior can be switched off by overriding the
upload
method:override def upload = false
or by passing the 'nolibjars' argument on the command line