NDNN (nd4j-api 1.0.0-beta6 API)

java.lang.Object
- org.nd4j.linalg.factory.ops.NDNN

```
public class NDNN
extends Object
```

Constructor Summary

Constructors
Constructor and Description

NDNN()

Constructors
Constructor and Description
`NDNN()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`INDArray`	`batchNorm(INDArray input, INDArray mean, INDArray variance, INDArray gamma, INDArray beta, double epsilon, int... axis)` Neural network batch normalization operation. For details, see https://arxiv.org/abs/1502.03167
`INDArray`	`biasAdd(INDArray input, INDArray bias, boolean nchw)` Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
`INDArray`	`dotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray mask, boolean scaled)` This operation performs dot product attention on the given timeseries input with the given queries out = sum(similarity(k_i, q) * v_i) similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q Optionally with normalization step: similarity(k, q) = softmax(k * q / sqrt(size(q)) See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p.
`INDArray`	`dropout(INDArray input, double inputRetainProbability)` Dropout operation
`INDArray`	`elu(INDArray x)` Element-wise exponential linear unit (ELU) function: out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0
`INDArray`	`gelu(INDArray x)` GELU activation function - Gaussian Error Linear Units For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation
`INDArray`	`hardSigmoid(INDArray x)` Element-wise hard sigmoid function: out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5
`INDArray`	`hardTanh(INDArray x)` Element-wise hard tanh function: out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1
`INDArray`	`hardTanhDerivative(INDArray x)` Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
`INDArray`	`layerNorm(INDArray input, INDArray gain, boolean channelsFirst, int... dimensions)` Apply Layer Normalization y = gain * standardize(x) + bias
`INDArray`	`layerNorm(INDArray input, INDArray gain, INDArray bias, boolean channelsFirst, int... dimensions)` Apply Layer Normalization y = gain * standardize(x) + bias
`INDArray`	`leakyRelu(INDArray x, INDArray alpha)` Element-wise leaky ReLU function: out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01
`INDArray`	`leakyReluDerivative(INDArray x, INDArray alpha)` Leaky ReLU derivative: dOut/dIn given input.
`INDArray`	`linear(INDArray input, INDArray weights, INDArray bias)` Linear layer operation: out = mmul(in,w) + bias Note that bias array is optional
`INDArray`	`logSigmoid(INDArray x)` Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
`INDArray`	`logSoftmax(INDArray x)` Log softmax activation
`INDArray`	`logSoftmax(INDArray x, int dimension)` Log softmax activation
`INDArray`	`multiHeadDotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray Wq, INDArray Wk, INDArray Wv, INDArray Wo, INDArray mask, boolean scaled)` This performs multi-headed dot product attention on the given timeseries input out = concat(head_1, head_2, ..., head_n) * Wo head_i = dot_product_attention(Wq_iq, Wk_ik, Wv_i*v) Optionally with normalization when calculating the attention for each head. See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp.
`INDArray`	`prelu(INDArray input, INDArray alpha, int... sharedAxes)` PReLU (Parameterized Rectified Linear Unit) operation.
`INDArray`	`relu(INDArray x, double cutoff)` Element-wise rectified linear function with specified cutoff: out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise
`INDArray`	`relu6(INDArray x, double cutoff)` Element-wise "rectified linear 6" function with specified cutoff: out[i] = min(max(in, cutoff), 6)
`INDArray`	`reluLayer(INDArray input, INDArray weights, INDArray bias)` ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias) Note that bias array is optional
`INDArray`	`selu(INDArray x)` Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default scale and alpha values.
`INDArray`	`sigmoid(INDArray x)` Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
`INDArray`	`sigmoidDerivative(INDArray x, INDArray wrt)` Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
`INDArray`	`softmax(INDArray x, int dimension)` Softmax activation, along the specified dimension
`INDArray`	`softmaxDerivative(INDArray x, INDArray wrt, int dimension)` Softmax derivative function
`INDArray`	`softplus(INDArray x)` Element-wise softplus function: out = log(exp(x) + 1)
`INDArray`	`softsign(INDArray x)` Element-wise softsign function: out = x / (abs(x) + 1)
`INDArray`	`softsignDerivative(INDArray x)` Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
`INDArray`	`swish(INDArray x)` Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0 See: https://arxiv.org/abs/1710.05941

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - NDNN
```
public NDNN()
```
- Method Detail
  - batchNorm
```
public INDArray batchNorm(INDArray input,
                          INDArray mean,
                          INDArray variance,
                          INDArray gamma,
                          INDArray beta,
                          double epsilon,
                          int... axis)
```
    Neural network batch normalization operation.
    For details, see https://arxiv.org/abs/1502.03167
    
    Parameters:
    
    input - Input variable. (NUMERIC type)
    
    mean - Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)
    
    variance - Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)
    
    gamma - Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)
    
    beta - Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)
    
    epsilon - Epsilon constant for numerical stability (to avoid division by 0)
    
    axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))
    
    Returns:
    
    output variable for batch normalization (NUMERIC type)
  - biasAdd
```
public INDArray biasAdd(INDArray input,
                        INDArray bias,
                        boolean nchw)
```
    Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
    
    Parameters:
    
    input - 4d input variable (NUMERIC type)
    
    bias - 1d bias (NUMERIC type)
    
    nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs
    
    Returns:
    
    output Output variable, after applying bias add operation (NUMERIC type)
  - dotProductAttention
```
public INDArray dotProductAttention(INDArray queries,
                                    INDArray keys,
                                    INDArray values,
                                    INDArray mask,
                                    boolean scaled)
```
    This operation performs dot product attention on the given timeseries input with the given queries
    out = sum(similarity(k_i, q) * v_i)
    
    similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
    
    Optionally with normalization step:
    similarity(k, q) = softmax(k * q / sqrt(size(q))
    
    See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)
    
    Note: This supports multiple queries at once, if only one query is available the queries vector still has to
    be 3D but can have queryCount = 1
    
    Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
    both.
    
    Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
    output rank will depend on the input rank.
    
    Parameters:
    
    queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)
    
    keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)
    
    values - input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)
    
    mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
    
    scaled - normalization, false -> do not apply normalization, true -> apply normalization
    
    Returns:
    
    output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
  - dropout
```
public INDArray dropout(INDArray input,
                        double inputRetainProbability)
```
    Dropout operation
    
    Parameters:
    
    input - Input array (NUMERIC type)
    
    inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)
    
    Returns:
    
    output Output (NUMERIC type)
  - elu
```
public INDArray elu(INDArray x)
```
    Element-wise exponential linear unit (ELU) function:
    out = x if x > 0
    out = a * (exp(x) - 1) if x <= 0
    with constant a = 1.0
    
    See: https://arxiv.org/abs/1511.07289
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - gelu
```
public INDArray gelu(INDArray x)
```
    GELU activation function - Gaussian Error Linear Units
    For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
    This method uses the sigmoid approximation
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - hardSigmoid
```
public INDArray hardSigmoid(INDArray x)
```
    Element-wise hard sigmoid function:
    out[i] = 0 if in[i] <= -2.5
    out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
    out[i] = 1 if in[i] >= 2.5
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - hardTanh
```
public INDArray hardTanh(INDArray x)
```
    Element-wise hard tanh function:
    out[i] = -1 if in[i] <= -1
    out[1] = in[i] if -1 < in[i] < 1
    out[i] = 1 if in[i] >= 1
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - hardTanhDerivative
```
public INDArray hardTanhDerivative(INDArray x)
```
    Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - layerNorm
```
public INDArray layerNorm(INDArray input,
                          INDArray gain,
                          INDArray bias,
                          boolean channelsFirst,
                          int... dimensions)
```
    Apply Layer Normalization
    
    y = gain * standardize(x) + bias
    
    Parameters:
    
    input - Input variable (NUMERIC type)
    
    gain - Gain (NUMERIC type)
    
    bias - Bias (NUMERIC type)
    
    channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
    
    dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
    
    Returns:
    
    output Output variable (NUMERIC type)
  - layerNorm
```
public INDArray layerNorm(INDArray input,
                          INDArray gain,
                          boolean channelsFirst,
                          int... dimensions)
```
    Apply Layer Normalization
    
    y = gain * standardize(x) + bias
    
    Parameters:
    
    input - Input variable (NUMERIC type)
    
    gain - Gain (NUMERIC type)
    
    channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
    
    dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
    
    Returns:
    
    output Output variable (NUMERIC type)
  - leakyRelu
```
public INDArray leakyRelu(INDArray x,
                          INDArray alpha)
```
    Element-wise leaky ReLU function:
    out = x if x >= 0.0
    out = alpha * x if x < cutoff
    Alpha value is most commonly set to 0.01
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    alpha - Cutoff - commonly 0.01 (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - leakyReluDerivative
```
public INDArray leakyReluDerivative(INDArray x,
                                    INDArray alpha)
```
    Leaky ReLU derivative: dOut/dIn given input.
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    alpha - Cutoff - commonly 0.01 (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - linear
```
public INDArray linear(INDArray input,
                       INDArray weights,
                       INDArray bias)
```
    Linear layer operation: out = mmul(in,w) + bias
    Note that bias array is optional
    
    Parameters:
    
    input - Input data (NUMERIC type)
    
    weights - Weights variable, shape [nIn, nOut] (NUMERIC type)
    
    bias - Optional bias variable (may be null) (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - logSigmoid
```
public INDArray logSigmoid(INDArray x)
```
    Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - logSoftmax
```
public INDArray logSoftmax(INDArray x)
```
    Log softmax activation
    
    Parameters:
    
    x - (NUMERIC type)
    
    Returns:
    
    output (NUMERIC type)
  - logSoftmax
```
public INDArray logSoftmax(INDArray x,
                           int dimension)
```
    Log softmax activation
    
    Parameters:
    
    x - Input (NUMERIC type)
    
    dimension - Dimension along which to apply log softmax
    
    Returns:
    
    output Output - log(softmax(input)) (NUMERIC type)
  - multiHeadDotProductAttention
```
public INDArray multiHeadDotProductAttention(INDArray queries,
                                             INDArray keys,
                                             INDArray values,
                                             INDArray Wq,
                                             INDArray Wk,
                                             INDArray Wv,
                                             INDArray Wo,
                                             INDArray mask,
                                             boolean scaled)
```
    This performs multi-headed dot product attention on the given timeseries input
    out = concat(head_1, head_2, ..., head_n) * Wo
    head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
    
    Optionally with normalization when calculating the attention for each head.
    
    See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")
    
    This makes use of dot_product_attention OP support for rank 4 inputs.
    see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)
    
    Parameters:
    
    queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)
    
    keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)
    
    values - input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)
    
    Wq - input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
    
    Wk - input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
    
    Wv - input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)
    
    Wo - output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)
    
    mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
    
    scaled - normalization, false -> do not apply normalization, true -> apply normalization
    
    Returns:
    
    output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
  - prelu
```
public INDArray prelu(INDArray input,
                      INDArray alpha,
                      int... sharedAxes)
```
    PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
    out[i] = in[i] if in[i] >= 0
    out[i] = in[i] * alpha[i] otherwise
    
    sharedAxes allows you to share learnable parameters along axes.
    For example, if the input has shape [batchSize, channels, height, width]
    and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
    alpha with shape [channels].
    
    Parameters:
    
    input - Input data (NUMERIC type)
    
    alpha - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)
    
    sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))
    
    Returns:
    
    output Output (NUMERIC type)
  - relu
```
public INDArray relu(INDArray x,
                     double cutoff)
```
    Element-wise rectified linear function with specified cutoff:
    out[i] = in[i] if in[i] >= cutoff
    out[i] = 0 otherwise
    
    Parameters:
    
    x - Input (NUMERIC type)
    
    cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0
    
    Returns:
    
    output Output (NUMERIC type)
  - relu6
```
public INDArray relu6(INDArray x,
                      double cutoff)
```
    Element-wise "rectified linear 6" function with specified cutoff:
    out[i] = min(max(in, cutoff), 6)
    
    Parameters:
    
    x - Input (NUMERIC type)
    
    cutoff - Cutoff value for ReLU operation. Usually 0
    
    Returns:
    
    output Output (NUMERIC type)
  - reluLayer
```
public INDArray reluLayer(INDArray input,
                          INDArray weights,
                          INDArray bias)
```
    ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
    Note that bias array is optional
    
    Parameters:
    
    input - Input data (NUMERIC type)
    
    weights - Weights variable (NUMERIC type)
    
    bias - Optional bias variable (may be null) (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - selu
```
public INDArray selu(INDArray x)
```
    Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
    
    out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
    Uses default scale and alpha values.
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - sigmoid
```
public INDArray sigmoid(INDArray x)
```
    Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - sigmoidDerivative
```
public INDArray sigmoidDerivative(INDArray x,
                                  INDArray wrt)
```
    Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
    
    Parameters:
    
    x - Input Variable (NUMERIC type)
    
    wrt - Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)
    
    Returns:
    
    output Output (gradient at input of sigmoid) (NUMERIC type)
  - softmax
```
public INDArray softmax(INDArray x,
                        int dimension)
```
    Softmax activation, along the specified dimension
    
    Parameters:
    
    x - Input (NUMERIC type)
    
    dimension - Dimension along which to apply softmax
    
    Returns:
    
    output Output variable (NUMERIC type)
  - softmaxDerivative
```
public INDArray softmaxDerivative(INDArray x,
                                  INDArray wrt,
                                  int dimension)
```
    Softmax derivative function
    
    Parameters:
    
    x - Softmax input (NUMERIC type)
    
    wrt - Gradient at output, dL/dx (NUMERIC type)
    
    dimension - Softmax dimension
    
    Returns:
    
    output (NUMERIC type)
  - softplus
```
public INDArray softplus(INDArray x)
```
    Element-wise softplus function: out = log(exp(x) + 1)
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - softsign
```
public INDArray softsign(INDArray x)
```
    Element-wise softsign function: out = x / (abs(x) + 1)
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)
  - softsignDerivative
```
public INDArray softsignDerivative(INDArray x)
```
    Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output (NUMERIC type)
  - swish
```
public INDArray swish(INDArray x)
```
    Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
    See: https://arxiv.org/abs/1710.05941
    
    Parameters:
    
    x - Input variable (NUMERIC type)
    
    Returns:
    
    output Output variable (NUMERIC type)

Class NDNN

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

NDNN

Method Detail

batchNorm

biasAdd

dotProductAttention

dropout

elu

gelu

hardSigmoid

hardTanh

hardTanhDerivative

layerNorm

layerNorm

leakyRelu

leakyReluDerivative

linear

logSigmoid

logSoftmax

logSoftmax

multiHeadDotProductAttention

prelu

relu

relu6

reluLayer

selu

sigmoid

sigmoidDerivative

softmax

softmaxDerivative

softplus

softsign

softsignDerivative

swish