public class SDNN extends SDOps
SameDiff.math()
SDCNN
(accessible via SameDiff.cnn()
for convolutional neural network ops.SDRNN
(accessible via SameDiff.rnn()
for recurrent neural network ops.Modifier and Type | Method and Description |
---|---|
SDVariable |
batchNorm(SDVariable input,
SDVariable mean,
SDVariable variance,
SDVariable gamma,
SDVariable beta,
double epsilon,
int... axis)
Batch norm operation.
|
SDVariable |
batchNorm(String name,
SDVariable input,
SDVariable mean,
SDVariable variance,
SDVariable gamma,
SDVariable beta,
boolean applyGamma,
boolean applyBeta,
double epsilon,
int... axis)
Batch normalization with optional application of gamma/beta args.
|
SDVariable |
batchNorm(String name,
SDVariable input,
SDVariable mean,
SDVariable variance,
SDVariable gamma,
SDVariable beta,
double epsilon,
int... axis)
Neural network batch normalization operation.
For details, see http://arxiv.org/abs/1502.03167 |
SDVariable |
biasAdd(SDVariable input,
SDVariable bias) |
SDVariable |
biasAdd(String name,
SDVariable input,
SDVariable bias)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
|
SDVariable |
dotProductAttention(SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable mask,
boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
|
List<SDVariable> |
dotProductAttention(SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable mask,
boolean scaled,
boolean withWeights)
This operation performs dot product attention on the given timeseries input with the given queries
|
SDVariable |
dotProductAttention(String name,
SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable mask,
boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
|
List<SDVariable> |
dotProductAttention(String name,
SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable mask,
boolean scaled,
boolean withWeights)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p.
|
SDVariable |
dropout(SDVariable input,
double inputRetainProbability) |
SDVariable |
dropout(String name,
SDVariable input,
double inputRetainProbability) |
SDVariable |
elu(SDVariable x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0 |
SDVariable |
elu(String name,
SDVariable x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0 |
SDVariable |
eluDerivative(SDVariable x)
Element-wise derivative exponential linear unit (ELU) function, dOut/dIn given input.
|
SDVariable |
eluDerivative(String name,
SDVariable x)
Element-wise derivative exponential linear unit (ELU) function, dOut/dIn given input.
|
SDVariable |
gelu(SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation |
SDVariable |
gelu(String name,
SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation |
SDVariable |
hardSigmoid(SDVariable in)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5 |
SDVariable |
hardSigmoid(String name,
SDVariable in)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5 |
SDVariable |
hardTanh(SDVariable in)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1 |
SDVariable |
hardTanh(String name,
SDVariable in)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1 |
SDVariable |
hardTanhDerivative(SDVariable x)
Derivative (dOut/dIn) of the element-wise hard Tanh function -
hardTanh(SDVariable) |
SDVariable |
hardTanhDerivative(String name,
SDVariable x)
Derivative (dOut/dIn) of the element-wise hard Tanh function -
hardTanh(SDVariable) |
SDVariable |
layerNorm(SDVariable input,
SDVariable gain,
int... dimensions)
Apply Layer Normalization without bias
y = gain * standardize(x)
|
SDVariable |
layerNorm(SDVariable input,
SDVariable gain,
SDVariable bias,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias
|
SDVariable |
layerNorm(String name,
SDVariable input,
SDVariable gain,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x)
|
SDVariable |
layerNorm(String name,
SDVariable input,
SDVariable gain,
SDVariable bias,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias
|
SDVariable |
leakyRelu(SDVariable x,
double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01 |
SDVariable |
leakyRelu(String name,
SDVariable x,
double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01 |
SDVariable |
leakyReluDerivative(String name,
SDVariable x,
double alpha)
Leaky ReLU derivative: dOut/dIn given input.
See leakyRelu(String, SDVariable, double) |
SDVariable |
linear(SDVariable input,
SDVariable weights,
SDVariable bias) |
SDVariable |
linear(String name,
SDVariable input,
SDVariable weights,
SDVariable bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional |
SDVariable |
logSigmoid(SDVariable x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
|
SDVariable |
logSigmoid(String name,
SDVariable x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
|
SDVariable |
logSoftmax(SDVariable x)
Log softmax activation
|
SDVariable |
logSoftmax(String name,
SDVariable x)
Log softmax activation
|
SDVariable |
multiHeadDotProductAttention(SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable Wq,
SDVariable Wk,
SDVariable Wv,
SDVariable Wo,
SDVariable mask,
boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
|
List<SDVariable> |
multiHeadDotProductAttention(SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable Wq,
SDVariable Wk,
SDVariable Wv,
SDVariable Wo,
SDVariable mask,
boolean scaled,
boolean withWeights)
This performs multi-headed dot product attention on the given timeseries input
|
SDVariable |
multiHeadDotProductAttention(String name,
SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable Wq,
SDVariable Wk,
SDVariable Wv,
SDVariable Wo,
SDVariable mask,
boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
|
List<SDVariable> |
multiHeadDotProductAttention(String name,
SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable Wq,
SDVariable Wk,
SDVariable Wv,
SDVariable Wo,
SDVariable mask,
boolean scaled,
boolean withWeights)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
|
SDVariable |
pad(SDVariable input,
int[][] padding,
double constant)
|
SDVariable |
pad(SDVariable input,
SDVariable padding,
double constant)
Perform padding on the given array, where padded values are the specified constant.
Example: Input array: [1, 2] [3, 4] Padding array: [2, 0] [1, 1] Contant = 0 Result: [0, 0, 0, 0] [0, 0, 0, 0] [0, 1, 2, 0] [0, 3, 4, 0] |
SDVariable |
pad(String outputName,
SDVariable input,
SDVariable padding,
Pad.Mode mode,
double constant)
As per
pad(SDVariable, SDVariable, double) but also supports multiple Pad.Mode modes.Example: Input array: [1, 2] [3, 4] [5, 6] Padding array: [2, 0] [1, 1] Contant = 0 Result: CONSTANT mode [0, 0, 0, 0] [0, 0, 0, 0] [0, 1, 2, 0] [0, 3, 4, 0] [0, 5, 6, 0] Result: SYMMETRIC mode [3, 3, 4, 4] [1, 1, 2, 2] [1, 1, 2, 2] [3, 3, 4, 4] [5, 5, 6, 6] Result: REFLECT: [6, 5, 6, 0] [2, 3, 4, 3] [2, 1, 2, 1] [4, 3, 4, 3] [6, 5, 6, 5] |
SDVariable |
relu(SDVariable x,
double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise |
SDVariable |
relu(String name,
SDVariable x,
double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise |
SDVariable |
relu6(SDVariable x,
double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6) |
SDVariable |
relu6(String name,
SDVariable x,
double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6) |
SDVariable |
reluLayer(SDVariable input,
SDVariable weights,
SDVariable bias) |
SDVariable |
reluLayer(String name,
SDVariable input,
SDVariable weights,
SDVariable bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional |
SDVariable |
selu(SDVariable x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default lcale and alpha values. |
SDVariable |
selu(String name,
SDVariable x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default lcale and alpha values. |
SDVariable |
sigmoid(SDVariable x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
|
SDVariable |
sigmoid(String name,
SDVariable x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
|
SDVariable |
sigmoidDerivative(SDVariable x,
SDVariable wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
|
SDVariable |
sigmoidDerivative(String name,
SDVariable x,
SDVariable wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
|
SDVariable |
softmax(SDVariable x)
Softmax activation
|
SDVariable |
softmax(String name,
SDVariable x)
Softmax activation
|
SDVariable |
softmaxDerivative(String name,
SDVariable x,
SDVariable wrt) |
SDVariable |
softmaxDerivative(String name,
SDVariable x,
SDVariable wrt,
Integer dimension) |
SDVariable |
softplus(SDVariable x)
Element-wise softplus function: out = log(exp(x) + 1)
|
SDVariable |
softplus(String name,
SDVariable x)
Element-wise softplus function: out = log(exp(x) + 1)
|
SDVariable |
softsign(SDVariable x)
Element-wise softsign function: out = x / (abs(x) + 1)
|
SDVariable |
softsign(String name,
SDVariable x)
Element-wise softsign function: out = x / (abs(x) + 1)
|
SDVariable |
softsignDerivative(SDVariable x)
Element-wise derivative (dOut/dIn) of the softsign function
softsign(SDVariable) |
SDVariable |
softsignDerivative(String name,
SDVariable x)
Element-wise derivative (dOut/dIn) of the softsign function
softsign(SDVariable) |
SDVariable |
swish(SDVariable x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941 |
SDVariable |
swish(String name,
SDVariable x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941 |
SDVariable |
tanh(SDVariable x) |
SDVariable |
tanh(String name,
SDVariable x) |
f, updateVariableNameAndReference
public SDNN(SameDiff sameDiff)
public SDVariable batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)
public SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, boolean applyGamma, boolean applyBeta, double epsilon, int... axis)
batchNorm(String, SDVariable, SDVariable, SDVariable, SDVariable, SDVariable, double, int...)
public SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)
name
- Name of the output variableinput
- Input variable.mean
- Mean value. For 1d axis, this should match input.size(axis)variance
- Variance value. For 1d axis, this should match input.size(axis)gamma
- Gamma value. For 1d axis, this should match input.size(axis)beta
- Beta value. For 1d axis, this should match input.size(axis)epsilon
- Epsilon constant for numerical stability (to avoid division by 0)axis
- For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.public SDVariable biasAdd(SDVariable input, SDVariable bias)
biasAdd(String, SDVariable, SDVariable)
public SDVariable biasAdd(String name, SDVariable input, SDVariable bias)
name
- Name of the output variableinput
- 4d input variablebias
- 1d biaspublic SDVariable dropout(SDVariable input, double inputRetainProbability)
input
- InputinputRetainProbability
- Probability of retaining an input (set to 0 with probability 1-p)public SDVariable dropout(String name, SDVariable input, double inputRetainProbability)
input
- InputinputRetainProbability
- Probability of retaining an input (set to 0 with probability 1-p)public SDVariable elu(SDVariable x)
x
- Input variablepublic SDVariable elu(String name, SDVariable x)
name
- Output variable namex
- Input variablepublic SDVariable eluDerivative(SDVariable x)
elu(SDVariable)
x
- Input variablepublic SDVariable eluDerivative(String name, SDVariable x)
elu(SDVariable)
name
- Output variable namex
- Input variablepublic SDVariable gelu(SDVariable x)
x
- Inputpublic SDVariable gelu(String name, SDVariable x)
name
- Name of the output variable. May be null.x
- Inputpublic SDVariable hardSigmoid(SDVariable in)
in
- Input variablepublic SDVariable hardSigmoid(String name, SDVariable in)
name
- Name of the output variablein
- Input variablepublic SDVariable hardTanh(SDVariable in)
in
- Input variablepublic SDVariable hardTanh(String name, SDVariable in)
name
- Output variable namein
- Input variablepublic SDVariable hardTanhDerivative(SDVariable x)
hardTanh(SDVariable)
x
- Inputpublic SDVariable hardTanhDerivative(String name, SDVariable x)
hardTanh(SDVariable)
name
- Output variable namex
- Inputpublic SDVariable leakyRelu(SDVariable x, double alpha)
x
- Input variablealpha
- Cutoff - usually 0.0public SDVariable leakyRelu(String name, SDVariable x, double alpha)
x
- Input variablealpha
- Cutoff - usually 0.0public SDVariable leakyReluDerivative(String name, SDVariable x, double alpha)
leakyRelu(String, SDVariable, double)
x
- Input variablealpha
- Alpha valuepublic SDVariable linear(SDVariable input, SDVariable weights, SDVariable bias)
public SDVariable linear(String name, SDVariable input, SDVariable weights, SDVariable bias)
name
- Name of the output variableinput
- Input dataweights
- Weights variablebias
- Optional bias variable (may be null)public SDVariable logSigmoid(SDVariable x)
x
- Input Variablepublic SDVariable logSigmoid(String name, SDVariable x)
name
- Name of the output variablex
- Input Variablepublic SDVariable logSoftmax(SDVariable x)
x
- Input variablepublic SDVariable logSoftmax(String name, SDVariable x)
name
- Variable namex
- Input variablepublic SDVariable relu(SDVariable x, double cutoff)
x
- Input variablecutoff
- Cutoff value. Usually 0public SDVariable relu(String name, SDVariable x, double cutoff)
name
- Output variable namex
- Input variablecutoff
- Cutoff value. Usually 0public SDVariable relu6(SDVariable x, double cutoff)
x
- Input variablecutoff
- Cutoff value. Usually 0public SDVariable relu6(String name, SDVariable x, double cutoff)
name
- Output variable namex
- Input variablecutoff
- Cutoff value. Usually 0public SDVariable reluLayer(SDVariable input, SDVariable weights, SDVariable bias)
public SDVariable reluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)
name
- Name of the output variableinput
- Input dataweights
- Weights variablebias
- Optional bias variable (may be null)public SDVariable selu(SDVariable x)
x
- Input variablepublic SDVariable selu(String name, SDVariable x)
name
- Name of the output variablex
- Input variablepublic SDVariable sigmoid(SDVariable x)
x
- Input Variablepublic SDVariable sigmoid(String name, SDVariable x)
name
- Output variable namex
- Input Variablepublic SDVariable sigmoidDerivative(SDVariable x, SDVariable wrt)
x
- Input Variablewrt
- Gradient at the output - dL/dOut. Must have same shape as the inputpublic SDVariable sigmoidDerivative(String name, SDVariable x, SDVariable wrt)
name
- Output variable namex
- Input Variablewrt
- Gradient at the output - dL/dOut. Must have same shape as the inputpublic SDVariable softmax(SDVariable x)
x
- Input variablepublic SDVariable softmax(String name, SDVariable x)
x
- Input variablepublic SDVariable softmaxDerivative(String name, SDVariable x, SDVariable wrt)
x
- public SDVariable softmaxDerivative(String name, SDVariable x, SDVariable wrt, Integer dimension)
public SDVariable softplus(SDVariable x)
x
- Input variablepublic SDVariable softplus(String name, SDVariable x)
name
- Output variable namex
- Input variablepublic SDVariable softsign(SDVariable x)
x
- Input variablepublic SDVariable softsign(String name, SDVariable x)
name
- Output variable namex
- Input variablepublic SDVariable softsignDerivative(SDVariable x)
softsign(SDVariable)
x
- Input variablepublic SDVariable softsignDerivative(String name, SDVariable x)
softsign(SDVariable)
name
- Output variable namex
- Input variablepublic SDVariable swish(SDVariable x)
x
- Input variablepublic SDVariable swish(String name, SDVariable x)
name
- Name of the output variablex
- Input variablepublic SDVariable tanh(String name, SDVariable x)
public SDVariable tanh(SDVariable x)
public SDVariable layerNorm(SDVariable input, SDVariable gain, SDVariable bias, int... dimensions)
public SDVariable layerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, int... dimensions)
name
- Name of the output variableinput
- Input variablegain
- gainbias
- biaspublic SDVariable layerNorm(SDVariable input, SDVariable gain, int... dimensions)
public SDVariable layerNorm(String name, SDVariable input, SDVariable gain, int... dimensions)
name
- Name of the output variableinput
- Input variablegain
- gainpublic SDVariable pad(SDVariable input, int[][] padding, double constant)
public SDVariable pad(SDVariable input, SDVariable padding, double constant)
input
- Input array to padpadding
- Padding arrayconstant
- Constant to use for padded valuespublic SDVariable pad(String outputName, SDVariable input, SDVariable padding, Pad.Mode mode, double constant)
pad(SDVariable, SDVariable, double)
but also supports multiple Pad.Mode
modes.outputName
- input
- padding
- mode
- constant
- public SDVariable dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
public SDVariable dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
public List<SDVariable> dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled, boolean withWeights)
public List<SDVariable> dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled, boolean withWeights)
queries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]values
- input 3D array "values" of shape [batchSize, featureValues, timesteps]
or 4D array of shape [batchSize, numHeads, featureValues, timesteps]mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]scaled
- normalization, false -> do not apply normalization, true -> apply normalizationwithWeights
- return attention weights as well, false -> only one output, true -> two outputs
Output Arrays:public SDVariable multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
public SDVariable multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
public List<SDVariable> multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled, boolean withWeights)
public List<SDVariable> multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled, boolean withWeights)
queries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount]keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps]values
- input 3D array "values" of shape [batchSize, featureValues, timesteps]Wq
- input query projection weights of shape [numHeads, projectedKeys, featureKeys]Wk
- input key projection weights of shape [numHeads, projectedKeys, featureKeys]Wv:
- input value projection weights of shape [numHeads, projectedValues, featureValues]Wo:
- output projection weights of shape [numHeads * projectedValues, outSize]mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]scaled
- normalization, false -> do not apply normalization, true -> apply normalizationwithWeights
- return attention weights as well, false -> only one output, true -> two outputs
Output Arrays:dotProductAttention(String, SDVariable, SDVariable, SDVariable, SDVariable, boolean, boolean)
Copyright © 2019. All rights reserved.