SDNN (nd4j-api 1.0.0-beta4 API)

java.lang.Object
- org.nd4j.autodiff.samediff.ops.SDOps
- - org.nd4j.autodiff.samediff.ops.SDNN

```
public class SDNN
extends SDOps
```
SameDiff general neural network operations
Accessible via SameDiff.math()
See also SDCNN (accessible via SameDiff.cnn() for convolutional neural network ops.
See also SDRNN (accessible via SameDiff.rnn() for recurrent neural network ops.

Author:

Alex Black

Field Summary
- Fields inherited from class org.nd4j.autodiff.samediff.ops.SDOps
  sd

Constructor Summary

Constructors
Constructor and Description

SDNN(SameDiff sameDiff)

Constructors
Constructor and Description
`SDNN(SameDiff sameDiff)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`SDVariable`	`batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)` Batch norm operation.
`SDVariable`	`batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, boolean applyGamma, boolean applyBeta, double epsilon, int... axis)` Batch normalization with optional application of gamma/beta args.
`SDVariable`	`batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)` Neural network batch normalization operation. For details, see http://arxiv.org/abs/1502.03167
`SDVariable`	`biasAdd(SDVariable input, SDVariable bias)`
`SDVariable`	`biasAdd(String name, SDVariable input, SDVariable bias)` Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
`SDVariable`	`dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)` This operation performs dot product attention on the given timeseries input with the given queries
`List<SDVariable>`	`dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled, boolean withWeights)` This operation performs dot product attention on the given timeseries input with the given queries
`SDVariable`	`dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)` This operation performs dot product attention on the given timeseries input with the given queries
`List<SDVariable>`	`dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled, boolean withWeights)` This operation performs dot product attention on the given timeseries input with the given queries out = sum(similarity(k_i, q) * v_i) similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q Optionally with normalization step: similarity(k, q) = softmax(k * q / sqrt(size(q)) See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p.
`SDVariable`	`dropout(SDVariable input, double inputRetainProbability)`
`SDVariable`	`dropout(String name, SDVariable input, double inputRetainProbability)`
`SDVariable`	`elu(SDVariable x)` Element-wise exponential linear unit (ELU) function: out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0
`SDVariable`	`elu(String name, SDVariable x)` Element-wise exponential linear unit (ELU) function: out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0
`SDVariable`	`eluDerivative(SDVariable x)` Element-wise derivative exponential linear unit (ELU) function, dOut/dIn given input.
`SDVariable`	`eluDerivative(String name, SDVariable x)` Element-wise derivative exponential linear unit (ELU) function, dOut/dIn given input.
`SDVariable`	`gelu(SDVariable x)` GELU activation function - Gaussian Error Linear Units For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation
`SDVariable`	`gelu(String name, SDVariable x)` GELU activation function - Gaussian Error Linear Units For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation
`SDVariable`	`hardSigmoid(SDVariable in)` Element-wise hard sigmoid function: out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5
`SDVariable`	`hardSigmoid(String name, SDVariable in)` Element-wise hard sigmoid function: out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5
`SDVariable`	`hardTanh(SDVariable in)` Element-wise hard tanh function: out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1
`SDVariable`	`hardTanh(String name, SDVariable in)` Element-wise hard tanh function: out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1
`SDVariable`	`hardTanhDerivative(SDVariable x)` Derivative (dOut/dIn) of the element-wise hard Tanh function - `hardTanh(SDVariable)`
`SDVariable`	`hardTanhDerivative(String name, SDVariable x)` Derivative (dOut/dIn) of the element-wise hard Tanh function - `hardTanh(SDVariable)`
`SDVariable`	`layerNorm(SDVariable input, SDVariable gain, int... dimensions)` Apply Layer Normalization without bias y = gain * standardize(x)
`SDVariable`	`layerNorm(SDVariable input, SDVariable gain, SDVariable bias, int... dimensions)` Apply Layer Normalization y = gain * standardize(x) + bias
`SDVariable`	`layerNorm(String name, SDVariable input, SDVariable gain, int... dimensions)` Apply Layer Normalization y = gain * standardize(x)
`SDVariable`	`layerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, int... dimensions)` Apply Layer Normalization y = gain * standardize(x) + bias
`SDVariable`	`leakyRelu(SDVariable x, double alpha)` Element-wise leaky ReLU function: out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01
`SDVariable`	`leakyRelu(String name, SDVariable x, double alpha)` Element-wise leaky ReLU function: out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01
`SDVariable`	`leakyReluDerivative(String name, SDVariable x, double alpha)` Leaky ReLU derivative: dOut/dIn given input. See `leakyRelu(String, SDVariable, double)`
`SDVariable`	`linear(SDVariable input, SDVariable weights, SDVariable bias)`
`SDVariable`	`linear(String name, SDVariable input, SDVariable weights, SDVariable bias)` Linear layer operation: out = mmul(in,w) + bias Note that bias array is optional
`SDVariable`	`logSigmoid(SDVariable x)` Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
`SDVariable`	`logSigmoid(String name, SDVariable x)` Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
`SDVariable`	`logSoftmax(SDVariable x)` Log softmax activation
`SDVariable`	`logSoftmax(String name, SDVariable x)` Log softmax activation
`SDVariable`	`multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)` This performs multi-headed dot product attention on the given timeseries input
`List<SDVariable>`	`multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled, boolean withWeights)` This performs multi-headed dot product attention on the given timeseries input
`SDVariable`	`multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)` This performs multi-headed dot product attention on the given timeseries input
`List<SDVariable>`	`multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled, boolean withWeights)` This performs multi-headed dot product attention on the given timeseries input out = concat(head_1, head_2, ..., head_n) * Wo head_i = dot_product_attention(Wq_iq, Wk_ik, Wv_i*v) Optionally with normalization when calculating the attention for each head.
`SDVariable`	`pad(SDVariable input, int[][] padding, double constant)` See `pad(SDVariable, SDVariable, double)`
`SDVariable`	`pad(SDVariable input, SDVariable padding, double constant)` Perform padding on the given array, where padded values are the specified constant. Example: Input array: [1, 2] [3, 4] Padding array: [2, 0] [1, 1] Contant = 0 Result: [0, 0, 0, 0] [0, 0, 0, 0] [0, 1, 2, 0] [0, 3, 4, 0]
`SDVariable`	`pad(String outputName, SDVariable input, SDVariable padding, Pad.Mode mode, double constant)` As per `pad(SDVariable, SDVariable, double)` but also supports multiple `Pad.Mode` modes. Example: Input array: [1, 2] [3, 4] [5, 6] Padding array: [2, 0] [1, 1] Contant = 0 Result: CONSTANT mode [0, 0, 0, 0] [0, 0, 0, 0] [0, 1, 2, 0] [0, 3, 4, 0] [0, 5, 6, 0] Result: SYMMETRIC mode [3, 3, 4, 4] [1, 1, 2, 2] [1, 1, 2, 2] [3, 3, 4, 4] [5, 5, 6, 6] Result: REFLECT: [6, 5, 6, 0] [2, 3, 4, 3] [2, 1, 2, 1] [4, 3, 4, 3] [6, 5, 6, 5]
`SDVariable`	`relu(SDVariable x, double cutoff)` Element-wise rectified linear function with specified cutoff: out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise
`SDVariable`	`relu(String name, SDVariable x, double cutoff)` Element-wise rectified linear function with specified cutoff: out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise
`SDVariable`	`relu6(SDVariable x, double cutoff)` Element-wise "rectified linear 6" function with specified cutoff: out[i] = min(max(in, cutoff), 6)
`SDVariable`	`relu6(String name, SDVariable x, double cutoff)` Element-wise "rectified linear 6" function with specified cutoff: out[i] = min(max(in, cutoff), 6)
`SDVariable`	`reluLayer(SDVariable input, SDVariable weights, SDVariable bias)`
`SDVariable`	`reluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)` ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias) Note that bias array is optional
`SDVariable`	`selu(SDVariable x)` Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default lcale and alpha values.
`SDVariable`	`selu(String name, SDVariable x)` Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default lcale and alpha values.
`SDVariable`	`sigmoid(SDVariable x)` Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
`SDVariable`	`sigmoid(String name, SDVariable x)` Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
`SDVariable`	`sigmoidDerivative(SDVariable x, SDVariable wrt)` Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
`SDVariable`	`sigmoidDerivative(String name, SDVariable x, SDVariable wrt)` Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
`SDVariable`	`softmax(SDVariable x)` Softmax activation
`SDVariable`	`softmax(String name, SDVariable x)` Softmax activation
`SDVariable`	`softmaxDerivative(String name, SDVariable x, SDVariable wrt)`
`SDVariable`	`softmaxDerivative(String name, SDVariable x, SDVariable wrt, Integer dimension)`
`SDVariable`	`softplus(SDVariable x)` Element-wise softplus function: out = log(exp(x) + 1)
`SDVariable`	`softplus(String name, SDVariable x)` Element-wise softplus function: out = log(exp(x) + 1)
`SDVariable`	`softsign(SDVariable x)` Element-wise softsign function: out = x / (abs(x) + 1)
`SDVariable`	`softsign(String name, SDVariable x)` Element-wise softsign function: out = x / (abs(x) + 1)
`SDVariable`	`softsignDerivative(SDVariable x)` Element-wise derivative (dOut/dIn) of the softsign function `softsign(SDVariable)`
`SDVariable`	`softsignDerivative(String name, SDVariable x)` Element-wise derivative (dOut/dIn) of the softsign function `softsign(SDVariable)`
`SDVariable`	`swish(SDVariable x)` Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0 See: https://arxiv.org/abs/1710.05941
`SDVariable`	`swish(String name, SDVariable x)` Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0 See: https://arxiv.org/abs/1710.05941
`SDVariable`	`tanh(SDVariable x)`
`SDVariable`	`tanh(String name, SDVariable x)`

Methods inherited from class org.nd4j.autodiff.samediff.ops.SDOps
f, updateVariableNameAndReference

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- SDNN
```
public SDNN(SameDiff sameDiff)
```

Method Detail

batchNorm

public SDVariable batchNorm(SDVariable input,
                            SDVariable mean,
                            SDVariable variance,
                            SDVariable gamma,
                            SDVariable beta,
                            double epsilon,
                            int... axis)

Batch norm operation.

See Also:: batchNorm(String, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, double, int...)

batchNorm

public SDVariable batchNorm(String name,
                            SDVariable input,
                            SDVariable mean,
                            SDVariable variance,
                            SDVariable gamma,
                            SDVariable beta,
                            boolean applyGamma,
                            boolean applyBeta,
                            double epsilon,
                            int... axis)

Batch normalization with optional application of gamma/beta args. See batchNorm(String, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, double, int...)

batchNorm
```
public SDVariable batchNorm(String name,
                            SDVariable input,
                            SDVariable mean,
                            SDVariable variance,
                            SDVariable gamma,
                            SDVariable beta,
                            double epsilon,
                            int... axis)
```
Neural network batch normalization operation.
For details, see http://arxiv.org/abs/1502.03167

Parameters:

name - Name of the output variable

input - Input variable.

mean - Mean value. For 1d axis, this should match input.size(axis)

variance - Variance value. For 1d axis, this should match input.size(axis)

gamma - Gamma value. For 1d axis, this should match input.size(axis)

beta - Beta value. For 1d axis, this should match input.size(axis)

epsilon - Epsilon constant for numerical stability (to avoid division by 0)

axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.
For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC
For 1d/RNN activations: 1 for NCW format, 2 for NWC

Returns:

Output variable for batch normalization

biasAdd

public SDVariable biasAdd(SDVariable input,
                          SDVariable bias)

See Also:: biasAdd(String, SDVariable, SDVariable)

biasAdd
```
public SDVariable biasAdd(String name,
                          SDVariable input,
                          SDVariable bias)
```
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector

Parameters:

name - Name of the output variable

input - 4d input variable

bias - 1d bias

Returns:

Output variable

dropout
```
public SDVariable dropout(SDVariable input,
                          double inputRetainProbability)
```
Parameters:

input - Input

inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)

Returns:

dropout

public SDVariable dropout(String name,
                          SDVariable input,
                          double inputRetainProbability)

Parameters:: input - Input; inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)
Returns:

elu
```
public SDVariable elu(SDVariable x)
```
Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0
See: http://arxiv.org/abs/1511.07289

Parameters:

x - Input variable

Returns:

Output variable

elu
```
public SDVariable elu(String name,
                      SDVariable x)
```
Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0
See: http://arxiv.org/abs/1511.07289

Parameters:

name - Output variable name

x - Input variable

Returns:

Output variable

eluDerivative
```
public SDVariable eluDerivative(SDVariable x)
```
Element-wise derivative exponential linear unit (ELU) function, dOut/dIn given input. elu(SDVariable)

Parameters:

x - Input variable

Returns:

Output variable

eluDerivative
```
public SDVariable eluDerivative(String name,
                                SDVariable x)
```
Element-wise derivative exponential linear unit (ELU) function, dOut/dIn given input. elu(SDVariable)

Parameters:

name - Output variable name

x - Input variable

Returns:

Output variable

gelu
```
public SDVariable gelu(SDVariable x)
```
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation

Parameters:

x - Input

Returns:

Output variable - GELU applied to the input

gelu
```
public SDVariable gelu(String name,
                       SDVariable x)
```
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation

Parameters:

name - Name of the output variable. May be null.

x - Input

Returns:

Output variable - GELU applied to the input

hardSigmoid
```
public SDVariable hardSigmoid(SDVariable in)
```
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5

Parameters:

in - Input variable

Returns:

Output variable

hardSigmoid
```
public SDVariable hardSigmoid(String name,
                              SDVariable in)
```
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5

Parameters:

name - Name of the output variable

in - Input variable

Returns:

Output variable

hardTanh
```
public SDVariable hardTanh(SDVariable in)
```
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1

Parameters:

in - Input variable

Returns:

Output variable

hardTanh
```
public SDVariable hardTanh(String name,
                           SDVariable in)
```
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1

Parameters:

name - Output variable name

in - Input variable

Returns:

Output variable

hardTanhDerivative
```
public SDVariable hardTanhDerivative(SDVariable x)
```
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(SDVariable)

Parameters:

x - Input

Returns:

Output variable

hardTanhDerivative
```
public SDVariable hardTanhDerivative(String name,
                                     SDVariable x)
```
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(SDVariable)

Parameters:

name - Output variable name

x - Input

Returns:

Output variable

leakyRelu
```
public SDVariable leakyRelu(SDVariable x,
                            double alpha)
```
Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01

Parameters:

x - Input variable

alpha - Cutoff - usually 0.0

Returns:

Output variable

leakyRelu
```
public SDVariable leakyRelu(String name,
                            SDVariable x,
                            double alpha)
```
Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01

Parameters:

x - Input variable

alpha - Cutoff - usually 0.0

Returns:

Output variable

leakyReluDerivative

public SDVariable leakyReluDerivative(String name,
                                      SDVariable x,
                                      double alpha)

Leaky ReLU derivative: dOut/dIn given input.
See leakyRelu(String, SDVariable, double)

Parameters:: x - Input variable; alpha - Alpha value
Returns:: Output variable

linear

public SDVariable linear(SDVariable input,
                         SDVariable weights,
                         SDVariable bias)

See Also:: linear(String, SDVariable, SDVariable, SDVariable)

linear

public SDVariable linear(String name,
                         SDVariable input,
                         SDVariable weights,
                         SDVariable bias)

Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional

Parameters:: name - Name of the output variable; input - Input data; weights - Weights variable; bias - Optional bias variable (may be null)
Returns:: Output variable

logSigmoid
```
public SDVariable logSigmoid(SDVariable x)
```
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

Parameters:

x - Input Variable

Returns:

Output variable

logSigmoid
```
public SDVariable logSigmoid(String name,
                             SDVariable x)
```
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

Parameters:

name - Name of the output variable

x - Input Variable

Returns:

Output variable

logSoftmax
```
public SDVariable logSoftmax(SDVariable x)
```
Log softmax activation

Parameters:

x - Input variable

Returns:

Output variable

logSoftmax

public SDVariable logSoftmax(String name,
                             SDVariable x)

Log softmax activation

Parameters:: name - Variable name; x - Input variable
Returns:: Output variable

relu
```
public SDVariable relu(SDVariable x,
                       double cutoff)
```
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise

Parameters:

x - Input variable

cutoff - Cutoff value. Usually 0

Returns:

Output variable

relu
```
public SDVariable relu(String name,
                       SDVariable x,
                       double cutoff)
```
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise

Parameters:

name - Output variable name

x - Input variable

cutoff - Cutoff value. Usually 0

Returns:

Output variable

relu6
```
public SDVariable relu6(SDVariable x,
                        double cutoff)
```
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)

Parameters:

x - Input variable

cutoff - Cutoff value. Usually 0

Returns:

Output variable

relu6
```
public SDVariable relu6(String name,
                        SDVariable x,
                        double cutoff)
```
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)

Parameters:

name - Output variable name

x - Input variable

cutoff - Cutoff value. Usually 0

Returns:

Output variable

reluLayer

public SDVariable reluLayer(SDVariable input,
                            SDVariable weights,
                            SDVariable bias)

See Also:: reluLayer(String, SDVariable, SDVariable, SDVariable)

reluLayer

public SDVariable reluLayer(String name,
                            SDVariable input,
                            SDVariable weights,
                            SDVariable bias)

ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional

Parameters:: name - Name of the output variable; input - Input data; weights - Weights variable; bias - Optional bias variable (may be null)
Returns:: Output variable

selu
```
public SDVariable selu(SDVariable x)
```
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default lcale and alpha values.

Parameters:

x - Input variable

Returns:

Output variable

selu
```
public SDVariable selu(String name,
                       SDVariable x)
```
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default lcale and alpha values.

Parameters:

name - Name of the output variable

x - Input variable

Returns:

Output variable

sigmoid
```
public SDVariable sigmoid(SDVariable x)
```
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

Parameters:

x - Input Variable

Returns:

Output variable

sigmoid
```
public SDVariable sigmoid(String name,
                          SDVariable x)
```
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

Parameters:

name - Output variable name

x - Input Variable

Returns:

Output variable

sigmoidDerivative
```
public SDVariable sigmoidDerivative(SDVariable x,
                                    SDVariable wrt)
```
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

Parameters:

x - Input Variable

wrt - Gradient at the output - dL/dOut. Must have same shape as the input

Returns:

Output variable

sigmoidDerivative

public SDVariable sigmoidDerivative(String name,
                                    SDVariable x,
                                    SDVariable wrt)

Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

Parameters:: name - Output variable name; x - Input Variable; wrt - Gradient at the output - dL/dOut. Must have same shape as the input
Returns:: Output variable

softmax
```
public SDVariable softmax(SDVariable x)
```
Softmax activation

Parameters:

x - Input variable

Returns:

Output variable

softmax

public SDVariable softmax(String name,
                          SDVariable x)

Softmax activation

Parameters:: x - Input variable
Returns:: Output variable

softmaxDerivative

public SDVariable softmaxDerivative(String name,
                                    SDVariable x,
                                    SDVariable wrt)

Parameters:: x -
Returns:

softmaxDerivative

public SDVariable softmaxDerivative(String name,
                                    SDVariable x,
                                    SDVariable wrt,
                                    Integer dimension)

softplus
```
public SDVariable softplus(SDVariable x)
```
Element-wise softplus function: out = log(exp(x) + 1)

Parameters:

x - Input variable

Returns:

Output variable

softplus
```
public SDVariable softplus(String name,
                           SDVariable x)
```
Element-wise softplus function: out = log(exp(x) + 1)

Parameters:

name - Output variable name

x - Input variable

Returns:

Output variable

softsign
```
public SDVariable softsign(SDVariable x)
```
Element-wise softsign function: out = x / (abs(x) + 1)

Parameters:

x - Input variable

Returns:

Output variable

softsign
```
public SDVariable softsign(String name,
                           SDVariable x)
```
Element-wise softsign function: out = x / (abs(x) + 1)

Parameters:

name - Output variable name

x - Input variable

Returns:

Output variable

softsignDerivative
```
public SDVariable softsignDerivative(SDVariable x)
```
Element-wise derivative (dOut/dIn) of the softsign function softsign(SDVariable)

Parameters:

x - Input variable

Returns:

Output varible

softsignDerivative
```
public SDVariable softsignDerivative(String name,
                                     SDVariable x)
```
Element-wise derivative (dOut/dIn) of the softsign function softsign(SDVariable)

Parameters:

name - Output variable name

x - Input variable

Returns:

Output varible

swish
```
public SDVariable swish(SDVariable x)
```
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941

Parameters:

x - Input variable

Returns:

Output variable

swish
```
public SDVariable swish(String name,
                        SDVariable x)
```
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941

Parameters:

name - Name of the output variable

x - Input variable

Returns:

Output variable

tanh

public SDVariable tanh(String name,
                       SDVariable x)

tanh
```
public SDVariable tanh(SDVariable x)
```

layerNorm

public SDVariable layerNorm(SDVariable input,
                            SDVariable gain,
                            SDVariable bias,
                            int... dimensions)

Apply Layer Normalization y = gain * standardize(x) + bias

Returns:: Output variable

layerNorm

public SDVariable layerNorm(String name,
                            SDVariable input,
                            SDVariable gain,
                            SDVariable bias,
                            int... dimensions)

Apply Layer Normalization y = gain * standardize(x) + bias

Parameters:: name - Name of the output variable; input - Input variable; gain - gain; bias - bias
Returns:: Output variable

layerNorm

public SDVariable layerNorm(SDVariable input,
                            SDVariable gain,
                            int... dimensions)

Apply Layer Normalization without bias y = gain * standardize(x)

Returns:: Output variable

layerNorm

public SDVariable layerNorm(String name,
                            SDVariable input,
                            SDVariable gain,
                            int... dimensions)

Apply Layer Normalization y = gain * standardize(x)

Parameters:: name - Name of the output variable; input - Input variable; gain - gain
Returns:: Output variable

pad

public SDVariable pad(SDVariable input,
                      int[][] padding,
                      double constant)

See pad(SDVariable, SDVariable, double)

pad
```
public SDVariable pad(SDVariable input,
                      SDVariable padding,
                      double constant)
```
Perform padding on the given array, where padded values are the specified constant.
Example:
Input array:
[1, 2]
[3, 4]
Padding array:
[2, 0]
[1, 1]
Contant = 0
Result:
[0, 0, 0, 0]
[0, 0, 0, 0]
[0, 1, 2, 0]
[0, 3, 4, 0]

Parameters:

input - Input array to pad

padding - Padding array

constant - Constant to use for padded values

Returns:

Padded array

pad
```
public SDVariable pad(String outputName,
                      SDVariable input,
                      SDVariable padding,
                      Pad.Mode mode,
                      double constant)
```
As per pad(SDVariable, SDVariable, double) but also supports multiple Pad.Mode modes.
Example: Input array:
[1, 2]
[3, 4]
[5, 6]
Padding array:
[2, 0]
[1, 1]
Contant = 0
Result: CONSTANT mode
[0, 0, 0, 0]
[0, 0, 0, 0]
[0, 1, 2, 0]
[0, 3, 4, 0]
[0, 5, 6, 0]

Result: SYMMETRIC mode
[3, 3, 4, 4]
[1, 1, 2, 2]
[1, 1, 2, 2]
[3, 3, 4, 4]
[5, 5, 6, 6]

Result: REFLECT:
[6, 5, 6, 0]
[2, 3, 4, 3]
[2, 1, 2, 1]
[4, 3, 4, 3]
[6, 5, 6, 5]

Parameters:

outputName -

input -

padding -

mode -

constant -

Returns:

dotProductAttention

public SDVariable dotProductAttention(SDVariable queries,
                                      SDVariable keys,
                                      SDVariable values,
                                      SDVariable mask,
                                      boolean scaled)

This operation performs dot product attention on the given timeseries input with the given queries

See Also:: dotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

dotProductAttention

public SDVariable dotProductAttention(String name,
                                      SDVariable queries,
                                      SDVariable keys,
                                      SDVariable values,
                                      SDVariable mask,
                                      boolean scaled)

This operation performs dot product attention on the given timeseries input with the given queries

See Also:: dotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

dotProductAttention

public List<SDVariable> dotProductAttention(SDVariable queries,
                                            SDVariable keys,
                                            SDVariable values,
                                            SDVariable mask,
                                            boolean scaled,
                                            boolean withWeights)

This operation performs dot product attention on the given timeseries input with the given queries

See Also:: dotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

dotProductAttention
```
public List<SDVariable> dotProductAttention(String name,
                                            SDVariable queries,
                                            SDVariable keys,
                                            SDVariable values,
                                            SDVariable mask,
                                            boolean scaled,
                                            boolean withWeights)
```
This operation performs dot product attention on the given timeseries input with the given queries out = sum(similarity(k_i, q) * v_i) similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q Optionally with normalization step: similarity(k, q) = softmax(k * q / sqrt(size(q)) See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1) Note: This supports multiple queries at once, if only one query is available the queries vector still has to be 3D but can have queryCount = 1 Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for both. Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The output rank will depend on the input rank.

Parameters:

queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]

keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]

values - input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps]

mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

scaled - normalization, false -> do not apply normalization, true -> apply normalization

withWeights - return attention weights as well, false -> only one output, true -> two outputs Output Arrays:

Returns:

[ Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount]]

multiHeadDotProductAttention

public SDVariable multiHeadDotProductAttention(SDVariable queries,
                                               SDVariable keys,
                                               SDVariable values,
                                               SDVariable Wq,
                                               SDVariable Wk,
                                               SDVariable Wv,
                                               SDVariable Wo,
                                               SDVariable mask,
                                               boolean scaled)

This performs multi-headed dot product attention on the given timeseries input

See Also:: multiHeadDotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

multiHeadDotProductAttention

public SDVariable multiHeadDotProductAttention(String name,
                                               SDVariable queries,
                                               SDVariable keys,
                                               SDVariable values,
                                               SDVariable Wq,
                                               SDVariable Wk,
                                               SDVariable Wv,
                                               SDVariable Wo,
                                               SDVariable mask,
                                               boolean scaled)

This performs multi-headed dot product attention on the given timeseries input

See Also:: multiHeadDotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

multiHeadDotProductAttention

public List<SDVariable> multiHeadDotProductAttention(SDVariable queries,
                                                     SDVariable keys,
                                                     SDVariable values,
                                                     SDVariable Wq,
                                                     SDVariable Wk,
                                                     SDVariable Wv,
                                                     SDVariable Wo,
                                                     SDVariable mask,
                                                     boolean scaled,
                                                     boolean withWeights)

This performs multi-headed dot product attention on the given timeseries input

See Also:: multiHeadDotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

multiHeadDotProductAttention
```
public List<SDVariable> multiHeadDotProductAttention(String name,
                                                     SDVariable queries,
                                                     SDVariable keys,
                                                     SDVariable values,
                                                     SDVariable Wq,
                                                     SDVariable Wk,
                                                     SDVariable Wv,
                                                     SDVariable Wo,
                                                     SDVariable mask,
                                                     boolean scaled,
                                                     boolean withWeights)
```
This performs multi-headed dot product attention on the given timeseries input out = concat(head_1, head_2, ..., head_n) * Wo head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v) Optionally with normalization when calculating the attention for each head. See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention") This makes use of dot_product_attention OP support for rank 4 inputs.

Parameters:

queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

values - input 3D array "values" of shape [batchSize, featureValues, timesteps]

Wq - input query projection weights of shape [numHeads, projectedKeys, featureKeys]

Wk - input key projection weights of shape [numHeads, projectedKeys, featureKeys]

Wv: - input value projection weights of shape [numHeads, projectedValues, featureValues]

Wo: - output projection weights of shape [numHeads * projectedValues, outSize]

mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

scaled - normalization, false -> do not apply normalization, true -> apply normalization

withWeights - return attention weights as well, false -> only one output, true -> two outputs Output Arrays:

Returns:

[ Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount]

See Also:

dotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)

Class SDNN

Field Summary

Fields inherited from class org.nd4j.autodiff.samediff.ops.SDOps

Constructor Summary

Method Summary

Methods inherited from class org.nd4j.autodiff.samediff.ops.SDOps

Methods inherited from class java.lang.Object

Constructor Detail

SDNN

Method Detail

batchNorm

batchNorm

batchNorm

biasAdd

biasAdd

dropout

dropout

elu

elu

eluDerivative

eluDerivative

gelu

gelu

hardSigmoid

hardSigmoid

hardTanh

hardTanh

hardTanhDerivative

hardTanhDerivative

leakyRelu

leakyRelu

leakyReluDerivative

linear

linear

logSigmoid

logSigmoid

logSoftmax

logSoftmax

relu

relu

relu6

relu6

reluLayer

reluLayer

selu

selu

sigmoid

sigmoid

sigmoidDerivative

sigmoidDerivative

softmax

softmax

softmaxDerivative

softmaxDerivative

softplus

softplus

softsign

softsign

softsignDerivative

softsignDerivative

swish

swish

tanh

tanh

layerNorm

layerNorm

layerNorm

layerNorm

pad

pad

pad

dotProductAttention

dotProductAttention

dotProductAttention

dotProductAttention

multiHeadDotProductAttention

multiHeadDotProductAttention

multiHeadDotProductAttention

multiHeadDotProductAttention