Trait

spark.jobserver.NamedRddSupport

_NamedRdds

Related Doc: package NamedRddSupport

Permalink

trait _NamedRdds extends AnyRef

NamedRdds - a trait that gives you safe, concurrent creation and access to named RDDs (the native SparkContext interface only has access to RDDs by numbers). It facilitates easy sharing of RDDs amongst jobs sharing the same SparkContext. If two jobs simultaneously tries to create an RDD with the same name, only one will win and the other will retrieve the same one.

Note that to take advantage of NamedRddSupport, a job must mix this in and use the APIs here instead of the native RDD cache(), otherwise we will not know about the names.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. _NamedRdds
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def defaultTimeout: Timeout

    Permalink
  2. abstract def destroy(name: String): Unit

    Permalink

    Destroys an RDD with the given name, if one existed.

    Destroys an RDD with the given name, if one existed. Has no effect if no RDD with this name exists.

    name

    the unique name of the RDD. The uniqueness is scoped to the current SparkContext.

  3. abstract def get[T](name: String)(implicit timeout: Timeout = defaultTimeout): Option[RDD[T]]

    Permalink

    Gets an RDD with the given name if it already exists and is cached by the RddManager.

    Gets an RDD with the given name if it already exists and is cached by the RddManager. If the RDD does not exist, None is returned.

    Note that a previously-known RDD could 'disappear' if it hasn't been used for a while, because the SparkContext garbage-collects old cached RDDs.

    T

    the generic type of the RDD.

    name

    the unique name of the RDD. The uniqueness is scoped to the current SparkContext.

    timeout

    if the RddManager doesn't respond within this timeout, an error will be thrown.

    returns

    the RDD with the given name.

    Exceptions thrown

    java.util.concurrent.TimeoutException if the request to the RddManager times out.

  4. abstract def getNames(): Iterable[String]

    Permalink

    Returns the names of all named RDDs that are managed by the RddManager.

    Returns the names of all named RDDs that are managed by the RddManager.

    Note: this returns a snapshot of RDD names at one point in time. The caller should always expect that the data returned from this method may be stale and incorrect.

    returns

    a collection of RDD names representing RDDs managed by the RddManager.

  5. abstract def getOrElseCreate[T](name: String, rddGen: ⇒ RDD[T], forceComputation: Boolean = true, storageLevel: StorageLevel = defaultStorageLevel)(implicit timeout: Timeout = defaultTimeout): RDD[T]

    Permalink

    Gets an RDD with the given name, or creates it if one doesn't already exist.

    Gets an RDD with the given name, or creates it if one doesn't already exist.

    If the given RDD has already been computed by another job and cached in memory, this method will return a reference to the cached RDD. If the RDD has never been computed, then the generator will be called to compute it, in the caller's thread, and the result will be cached and returned to the caller.

    If an RDD is requested by thread B while thread A is generating the RDD, thread B will block up to the duration specified by @timeout. If thread A finishes generating the RDD within that time, then thread B will get a reference to the newly-created RDD. If thread A does not finish generating the RDD within that time, then thread B will throw a timeout exception.

    T

    the generic type of the RDD.

    name

    the unique name of the RDD. The uniqueness is scoped to the current SparkContext.

    rddGen

    a 0-ary function which will generate the RDD if it doesn't already exist.

    forceComputation

    if true, forces the RDD to be computed by calling count().

    storageLevel

    the storage level to persist the RDD with. Default: StorageLevel.MEMORY_ONLY.

    timeout

    if the RddManager doesn't respond within this timeout, an error will be thrown.

    returns

    the RDD with the given name.

    Exceptions thrown

    java.lang.RuntimeException wrapping any error that occurs within the generator function.

    java.util.concurrent.TimeoutException if the request to the RddManager times out.

  6. abstract def update[T](name: String, rddGen: ⇒ RDD[T], forceComputation: Boolean = true, storageLevel: StorageLevel = defaultStorageLevel)(implicit timeout: Timeout = defaultTimeout): RDD[T]

    Permalink

    Replaces an existing RDD with a given name with a new RDD.

    Replaces an existing RDD with a given name with a new RDD. If an old RDD for the given name existed, it is un-persisted (non-blocking) and destroyed. It is safe to call this method when there is no existing RDD with the given name. If multiple threads call this around the same time, the end result is undefined - one of the generated RDDs will win and will be returned from future calls to get().

    The rdd generator function will be called from the caller's thread. Note that if this is called at the same time as getOrElseCreate() for the same name, and completes before the getOrElseCreate() call, then threads waiting for the result of getOrElseCreate() will unblock with the result of this update() call. When the getOrElseCreate() succeeds, it will replace the result of this update() call.

    T

    the generic type of the RDD.

    name

    the unique name of the RDD. The uniqueness is scoped to the current SparkContext.

    rddGen

    a 0-ary function which will be called to generate the RDD in the caller's thread.

    forceComputation

    if true, forces the RDD to be computed by calling count().

    storageLevel

    the storage level to persist the RDD with. Default: StorageLevel.MEMORY_ONLY.

    returns

    the RDD with the given name.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val defaultStorageLevel: StorageLevel

    Permalink
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  17. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  18. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped