Package

org.platanios.tensorflow.api.ops.training

distribute

Permalink

package distribute

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. distribute
  2. API
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. trait API extends AnyRef

    Permalink

  2. abstract class ColocatedVariableGetter extends VariableGetter

    Permalink
  3. trait Destination[T] extends AnyRef

    Permalink

  4. trait Distributable[T] extends AnyRef

    Permalink

  5. trait Reduction extends AnyRef

    Permalink

    Represents a reduction method.

  6. abstract class ReductionVariableGetter extends VariableGetter

    Permalink

Value Members

  1. object Destination

    Permalink
  2. object Distributable

    Permalink
  3. object MeanReduction extends Reduction with Product with Serializable

    Permalink

    Reduces the variable updates by averaging them.

  4. object Reduction

    Permalink
  5. object SumReduction extends Reduction with Product with Serializable

    Permalink

    Reduces the variable updates by summing them.

  6. def broadcast[O <: OutputLike](value: O, devices: Seq[DeviceSpecification] = Seq.empty)(implicit context: CrossTowerContext): MirroredValue[O]

    Permalink

    Mirrors value to all worker devices.

    Mirrors value to all worker devices.

    value

    Value to broadcast.

    devices

    Destination devices.

    returns

    Mirrored value.

    Definition Classes
    API
  7. def colocateVariablesWith[R](colocationOps: Set[Op])(block: ⇒ R)(implicit context: DistributionContext): R

    Permalink

    Executes block within a scope that controls which devices variables will be created on.

    Executes block within a scope that controls which devices variables will be created on.

    No operations should be added to the graph inside this scope; it should only be used when creating variables (some implementations work by changing variable creation and others work by using a colocateWith scope). This may only be used inside DistributionStrategy.scope.

    For example:

    distributionStrategy.scope {
      val variable1 = tf.variable(...)
      distributionStrategy.colocateVariablesWith(Set(variable1.op)) {
        // `variable2` and `variable3` will be created on the same device(s) as `variable1`.
        val variable2 = tf.variable(...)
        val variable3 = tf.variable(...)
      }
    
      def fn(v1: Variable, v2: Variable, v3: Variable): Unit = {
        // Operates on `v1` from `variable1`, `v2` from `variable2`, and `v3` from `variable3`.
      }
    
      // `fn` runs on every device `v1` is on, and `v2` and `v3` will be there too.
      distributionStrategy.update(variable1, fn, variable2, variable3)
    }
    colocationOps

    Variables created in block will be on the same set of devices as these ops.

    block

    Code block to execute in this scope.

    returns

    Value returned by block.

    Definition Classes
    API
  8. def currentDevice: String

    Permalink

    Returns the current device.

    Returns the current device.

    Definition Classes
    API
  9. def currentStrategy(implicit context: DistributionContext): DistributionStrategy

    Permalink

    Returns the current distribution strategy.

    Returns the current distribution strategy.

    Definition Classes
    API
  10. def currentUpdateDevice: Option[String]

    Permalink

    Returns the current device if in a distributionStrategy.update() call.

    Returns the current device if in a distributionStrategy.update() call.

    Definition Classes
    API
  11. def forEachTower[T, R](fn: (Seq[T]) ⇒ R, values: Seq[DistributedValue[T]])(implicit arg0: Distributable[T], context: CrossTowerContext): R

    Permalink

    Runs fn once per tower.

    Runs fn once per tower.

    fn may call tf.currentTowerContext to access fields and methods such as towerID and mergeCall(). mergeCall() is used to communicate between the towers and re-enter the cross-tower context. All towers pause their execution having encountered a mergeCall() call. After that the mergeFn-function is executed. Its results are then unwrapped and given back to each tower call. After that execution resumes until fn is complete or another mergeCall() is encountered.

    For example:

    // Called once in "cross-tower" context.
    def mergeFn(distributionStrategy: DistributionStrategy, threePlusTowerID: Int): tf.Output = {
      // Sum the values across towers.
      tf.addN(distribution.unwrap(threePlusTowerID))
    }
    
    // Called once per tower in `distributionStrategy`, in a "tower" context.
    def fn(three: Int): Output = {
      val towerContext = tf.currentTowerContext
      val v = three + towerContext.towerID
      // Computes the sum of the `v` values across all towers.
      val s = towerContext.mergeCall(mergeFn(_, v))
      s + v
    }
    
    distributionStrategy.scope {
      // In "cross-tower" context
      ...
      val mergedResults = distributionStrategy.forEachTower(() => fn(3))
      // `mergedResults` has the values from every tower execution of `fn`.
      val resultsList = distributionStrategy.unwrap(mergedResults)
    }
    fn

    Function that will be run once per tower.

    values

    Wrapped values that will be unwrapped when invoking fn on each tower.

    returns

    Merged return value of fn across all towers.

    Definition Classes
    API
  12. val logger: Logger

    Permalink
    Attributes
    protected
  13. package ops

    Permalink
  14. package packers

    Permalink
  15. package strategies

    Permalink
  16. def towerLocalVariableScope[R](reduction: Reduction)(block: ⇒ R)(implicit context: DistributionContext): R

    Permalink

    Executes block within a scope where new variables will not be mirrored.

    Executes block within a scope where new variables will not be mirrored.

    There will still be one component variable per tower, but there is no requirement that they stay in sync. Instead, when saving them or calling fetch(), we use the value that results when calling reduce() on all the towers' variables. Note that tower-local implies not trainable. Instead, it is expected that each tower will directly update (e.g., using assignAdd()) its local variable instance but only the aggregated value (accessible using fetch()) will be exported from the model. When it is acceptable to only aggregate on export, we greatly reduce communication overhead by using tower-local variables.

    Note that all component variables will be initialized to the same value, using the initialization expression from the first tower. The values will match even if the initialization expression uses random numbers.

    reduction

    Reduction method used to get the value to save when creating checkpoints.

    block

    Code block to execute in this scope.

    returns

    Value returned by block.

    Definition Classes
    API
  17. package values

    Permalink

Inherited from API

Inherited from AnyRef

Inherited from Any

Ungrouped