Destroys an RDD with the given name, if one existed.
Destroys an RDD with the given name, if one existed. Has no effect if no RDD with this name exists.
the unique name of the RDD. The uniqueness is scoped to the current SparkContext.
Gets an RDD with the given name if it already exists and is cached by the RddManager.
Gets an RDD with the given name if it already exists and is cached by the RddManager. If the RDD does not exist, None is returned.
Note that a previously-known RDD could 'disappear' if it hasn't been used for a while, because the SparkContext garbage-collects old cached RDDs.
the generic type of the RDD.
the unique name of the RDD. The uniqueness is scoped to the current SparkContext.
if the RddManager doesn't respond within this timeout, an error will be thrown.
the RDD with the given name.
java.util.concurrent.TimeoutException
if the request to the RddManager times out.
Returns the names of all named RDDs that are managed by the RddManager.
Returns the names of all named RDDs that are managed by the RddManager.
Note: this returns a snapshot of RDD names at one point in time. The caller should always expect that the data returned from this method may be stale and incorrect.
a collection of RDD names representing RDDs managed by the RddManager.
Gets an RDD with the given name, or creates it if one doesn't already exist.
Gets an RDD with the given name, or creates it if one doesn't already exist.
If the given RDD has already been computed by another job and cached in memory, this method will return a reference to the cached RDD. If the RDD has never been computed, then the generator will be called to compute it, in the caller's thread, and the result will be cached and returned to the caller.
If an RDD is requested by thread B while thread A is generating the RDD, thread B will block up to the duration specified by @timeout. If thread A finishes generating the RDD within that time, then thread B will get a reference to the newly-created RDD. If thread A does not finish generating the RDD within that time, then thread B will throw a timeout exception.
the generic type of the RDD.
the unique name of the RDD. The uniqueness is scoped to the current SparkContext.
a 0-ary function which will generate the RDD if it doesn't already exist.
if true, forces the RDD to be computed by calling count().
the storage level to persist the RDD with. Default: StorageLevel.MEMORY_ONLY.
if the RddManager doesn't respond within this timeout, an error will be thrown.
the RDD with the given name.
java.lang.RuntimeException
wrapping any error that occurs within the generator function.
java.util.concurrent.TimeoutException
if the request to the RddManager times out.
Replaces an existing RDD with a given name with a new RDD.
Replaces an existing RDD with a given name with a new RDD. If an old RDD for the given name existed, it is un-persisted (non-blocking) and destroyed. It is safe to call this method when there is no existing RDD with the given name. If multiple threads call this around the same time, the end result is undefined - one of the generated RDDs will win and will be returned from future calls to get().
The rdd generator function will be called from the caller's thread. Note that if this is called at the same time as getOrElseCreate() for the same name, and completes before the getOrElseCreate() call, then threads waiting for the result of getOrElseCreate() will unblock with the result of this update() call. When the getOrElseCreate() succeeds, it will replace the result of this update() call.
the generic type of the RDD.
the unique name of the RDD. The uniqueness is scoped to the current SparkContext.
a 0-ary function which will be called to generate the RDD in the caller's thread.
if true, forces the RDD to be computed by calling count().
the storage level to persist the RDD with. Default: StorageLevel.MEMORY_ONLY.
the RDD with the given name.
NamedRdds - a trait that gives you safe, concurrent creation and access to named RDDs (the native SparkContext interface only has access to RDDs by numbers). It facilitates easy sharing of RDDs amongst jobs sharing the same SparkContext. If two jobs simultaneously tries to create an RDD with the same name, only one will win and the other will retrieve the same one.
Note that to take advantage of NamedRddSupport, a job must mix this in and use the APIs here instead of the native RDD
cache()
, otherwise we will not know about the names.