public class SDNN extends SDOps
Modifier and Type | Method and Description |
---|---|
SDVariable |
batchNorm(SDVariable input,
SDVariable mean,
SDVariable variance,
SDVariable gamma,
SDVariable beta,
double epsilon,
int... axis)
|
SDVariable |
batchNorm(String name,
SDVariable input,
SDVariable mean,
SDVariable variance,
SDVariable gamma,
SDVariable beta,
double epsilon,
int... axis)
|
SDVariable |
biasAdd(SDVariable input,
SDVariable bias,
boolean nchw)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
|
SDVariable |
biasAdd(String name,
SDVariable input,
SDVariable bias,
boolean nchw)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
|
SDVariable |
cReLU(SDVariable x)
Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation.
|
SDVariable |
cReLU(String name,
SDVariable x)
Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation.
|
SDVariable |
dotProductAttention(SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable mask,
boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i) similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q Optionally with normalization step: similarity(k, q) = softmax(k * q / sqrt(size(q)) See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. |
SDVariable |
dotProductAttention(String name,
SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable mask,
boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i) similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q Optionally with normalization step: similarity(k, q) = softmax(k * q / sqrt(size(q)) See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. |
SDVariable |
dropout(SDVariable input,
double inputRetainProbability)
Dropout operation
|
SDVariable |
dropout(String name,
SDVariable input,
double inputRetainProbability)
Dropout operation
|
SDVariable |
elu(SDVariable x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0 |
SDVariable |
elu(String name,
SDVariable x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0 out = a * (exp(x) - 1) if x <= 0 with constant a = 1.0 |
SDVariable |
gelu(SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation |
SDVariable |
gelu(String name,
SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the sigmoid approximation |
SDVariable |
hardSigmoid(SDVariable x)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5 |
SDVariable |
hardSigmoid(String name,
SDVariable x)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5 out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5 out[i] = 1 if in[i] >= 2.5 |
SDVariable |
hardTanh(SDVariable x)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1 |
SDVariable |
hardTanh(String name,
SDVariable x)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1 out[1] = in[i] if -1 < in[i] < 1 out[i] = 1 if in[i] >= 1 |
SDVariable |
hardTanhDerivative(SDVariable x)
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
|
SDVariable |
hardTanhDerivative(String name,
SDVariable x)
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
|
SDVariable |
layerNorm(SDVariable input,
SDVariable gain,
boolean channelsFirst,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias |
SDVariable |
layerNorm(SDVariable input,
SDVariable gain,
SDVariable bias,
boolean channelsFirst,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias |
SDVariable |
layerNorm(String name,
SDVariable input,
SDVariable gain,
boolean channelsFirst,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias |
SDVariable |
layerNorm(String name,
SDVariable input,
SDVariable gain,
SDVariable bias,
boolean channelsFirst,
int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias |
SDVariable |
leakyRelu(SDVariable x,
double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01 |
SDVariable |
leakyRelu(String name,
SDVariable x,
double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0 out = alpha * x if x < cutoff Alpha value is most commonly set to 0.01 |
SDVariable |
leakyReluDerivative(SDVariable x,
double alpha)
Leaky ReLU derivative: dOut/dIn given input.
|
SDVariable |
leakyReluDerivative(String name,
SDVariable x,
double alpha)
Leaky ReLU derivative: dOut/dIn given input.
|
SDVariable |
linear(SDVariable input,
SDVariable weights,
SDVariable bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional |
SDVariable |
linear(String name,
SDVariable input,
SDVariable weights,
SDVariable bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional |
SDVariable |
logSigmoid(SDVariable x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
|
SDVariable |
logSigmoid(String name,
SDVariable x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
|
SDVariable |
logSoftmax(SDVariable x)
Log softmax activation
|
SDVariable |
logSoftmax(SDVariable x,
int dimension)
Log softmax activation
|
SDVariable |
logSoftmax(String name,
SDVariable x)
Log softmax activation
|
SDVariable |
logSoftmax(String name,
SDVariable x,
int dimension)
Log softmax activation
|
SDVariable |
multiHeadDotProductAttention(SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable Wq,
SDVariable Wk,
SDVariable Wv,
SDVariable Wo,
SDVariable mask,
boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v) Optionally with normalization when calculating the attention for each head. See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. |
SDVariable |
multiHeadDotProductAttention(String name,
SDVariable queries,
SDVariable keys,
SDVariable values,
SDVariable Wq,
SDVariable Wk,
SDVariable Wv,
SDVariable Wo,
SDVariable mask,
boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v) Optionally with normalization when calculating the attention for each head. See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. |
SDVariable |
pad(SDVariable input,
SDVariable padding,
double constant)
Padding operation
|
SDVariable |
pad(SDVariable input,
SDVariable padding,
PadMode PadMode,
double constant)
Padding operation
|
SDVariable |
pad(String name,
SDVariable input,
SDVariable padding,
double constant)
Padding operation
|
SDVariable |
pad(String name,
SDVariable input,
SDVariable padding,
PadMode PadMode,
double constant)
Padding operation
|
SDVariable |
preciseGelu(SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the precise method |
SDVariable |
preciseGelu(String name,
SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415 This method uses the precise method |
SDVariable |
prelu(SDVariable input,
SDVariable alpha,
int... sharedAxes)
PReLU (Parameterized Rectified Linear Unit) operation.
|
SDVariable |
prelu(String name,
SDVariable input,
SDVariable alpha,
int... sharedAxes)
PReLU (Parameterized Rectified Linear Unit) operation.
|
SDVariable |
relu(SDVariable x,
double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise |
SDVariable |
relu(String name,
SDVariable x,
double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff out[i] = 0 otherwise |
SDVariable |
relu6(SDVariable x,
double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6) |
SDVariable |
relu6(String name,
SDVariable x,
double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6) |
SDVariable |
reluLayer(SDVariable input,
SDVariable weights,
SDVariable bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional |
SDVariable |
reluLayer(String name,
SDVariable input,
SDVariable weights,
SDVariable bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional |
SDVariable |
selu(SDVariable x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default scale and alpha values. |
SDVariable |
selu(String name,
SDVariable x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0 Uses default scale and alpha values. |
SDVariable |
sigmoid(SDVariable x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
|
SDVariable |
sigmoid(String name,
SDVariable x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
|
SDVariable |
sigmoidDerivative(SDVariable x,
SDVariable wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
|
SDVariable |
sigmoidDerivative(String name,
SDVariable x,
SDVariable wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
|
SDVariable |
softmax(SDVariable x)
Softmax activation, along the specified dimension
|
SDVariable |
softmax(SDVariable x,
int dimension)
Softmax activation, along the specified dimension
|
SDVariable |
softmax(String name,
SDVariable x)
Softmax activation, along the specified dimension
|
SDVariable |
softmax(String name,
SDVariable x,
int dimension)
Softmax activation, along the specified dimension
|
SDVariable |
softmaxDerivative(SDVariable x,
SDVariable wrt,
int dimension)
Softmax derivative function
|
SDVariable |
softmaxDerivative(String name,
SDVariable x,
SDVariable wrt,
int dimension)
Softmax derivative function
|
SDVariable |
softplus(SDVariable x)
Element-wise softplus function: out = log(exp(x) + 1)
|
SDVariable |
softplus(String name,
SDVariable x)
Element-wise softplus function: out = log(exp(x) + 1)
|
SDVariable |
softsign(SDVariable x)
Element-wise softsign function: out = x / (abs(x) + 1)
|
SDVariable |
softsign(String name,
SDVariable x)
Element-wise softsign function: out = x / (abs(x) + 1)
|
SDVariable |
softsignDerivative(SDVariable x)
Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
|
SDVariable |
softsignDerivative(String name,
SDVariable x)
Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
|
SDVariable |
swish(SDVariable x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941 |
SDVariable |
swish(String name,
SDVariable x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941 |
SDVariable |
tanh(SDVariable x)
Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)
|
SDVariable |
tanh(String name,
SDVariable x)
Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)
|
public SDNN(SameDiff sameDiff)
public SDVariable cReLU(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable cReLU(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)
input
- Input variable. (NUMERIC type)mean
- Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)variance
- Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)gamma
- Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)beta
- Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)epsilon
- Epsilon constant for numerical stability (to avoid division by 0)axis
- For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.
For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC
For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))public SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)
name
- name May be null. Name for the output variableinput
- Input variable. (NUMERIC type)mean
- Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)variance
- Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)gamma
- Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)beta
- Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)epsilon
- Epsilon constant for numerical stability (to avoid division by 0)axis
- For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.
For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC
For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))public SDVariable biasAdd(SDVariable input, SDVariable bias, boolean nchw)
input
- 4d input variable (NUMERIC type)bias
- 1d bias (NUMERIC type)nchw
- The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].
Unused for 2d inputspublic SDVariable biasAdd(String name, SDVariable input, SDVariable bias, boolean nchw)
name
- name May be null. Name for the output variableinput
- 4d input variable (NUMERIC type)bias
- 1d bias (NUMERIC type)nchw
- The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].
Unused for 2d inputspublic SDVariable dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
queries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)values
- input 3D array "values" of shape [batchSize, featureValues, timesteps]
or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled
- normalization, false -> do not apply normalization, true -> apply normalizationpublic SDVariable dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
name
- name May be null. Name for the output variablequeries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)values
- input 3D array "values" of shape [batchSize, featureValues, timesteps]
or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled
- normalization, false -> do not apply normalization, true -> apply normalizationpublic SDVariable dropout(SDVariable input, double inputRetainProbability)
input
- Input array (NUMERIC type)inputRetainProbability
- Probability of retaining an input (set to 0 with probability 1-p)public SDVariable dropout(String name, SDVariable input, double inputRetainProbability)
name
- name May be null. Name for the output variableinput
- Input array (NUMERIC type)inputRetainProbability
- Probability of retaining an input (set to 0 with probability 1-p)public SDVariable elu(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable elu(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable gelu(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable gelu(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable hardSigmoid(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable hardSigmoid(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable hardTanh(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable hardTanh(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable hardTanhDerivative(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable hardTanhDerivative(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable layerNorm(SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions)
input
- Input variable (NUMERIC type)gain
- Gain (NUMERIC type)bias
- Bias (NUMERIC type)channelsFirst
- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions
- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))public SDVariable layerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions)
name
- name May be null. Name for the output variableinput
- Input variable (NUMERIC type)gain
- Gain (NUMERIC type)bias
- Bias (NUMERIC type)channelsFirst
- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions
- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))public SDVariable layerNorm(SDVariable input, SDVariable gain, boolean channelsFirst, int... dimensions)
input
- Input variable (NUMERIC type)gain
- Gain (NUMERIC type)channelsFirst
- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions
- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))public SDVariable layerNorm(String name, SDVariable input, SDVariable gain, boolean channelsFirst, int... dimensions)
name
- name May be null. Name for the output variableinput
- Input variable (NUMERIC type)gain
- Gain (NUMERIC type)channelsFirst
- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions
- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))public SDVariable leakyRelu(SDVariable x, double alpha)
x
- Input variable (NUMERIC type)alpha
- Cutoff - commonly 0.01public SDVariable leakyRelu(String name, SDVariable x, double alpha)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)alpha
- Cutoff - commonly 0.01public SDVariable leakyReluDerivative(SDVariable x, double alpha)
x
- Input variable (NUMERIC type)alpha
- Cutoff - commonly 0.01public SDVariable leakyReluDerivative(String name, SDVariable x, double alpha)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)alpha
- Cutoff - commonly 0.01public SDVariable linear(SDVariable input, SDVariable weights, SDVariable bias)
input
- Input data (NUMERIC type)weights
- Weights variable, shape [nIn, nOut] (NUMERIC type)bias
- Optional bias variable (may be null) (NUMERIC type)public SDVariable linear(String name, SDVariable input, SDVariable weights, SDVariable bias)
name
- name May be null. Name for the output variableinput
- Input data (NUMERIC type)weights
- Weights variable, shape [nIn, nOut] (NUMERIC type)bias
- Optional bias variable (may be null) (NUMERIC type)public SDVariable logSigmoid(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable logSigmoid(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable logSoftmax(SDVariable x)
x
- (NUMERIC type)public SDVariable logSoftmax(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- (NUMERIC type)public SDVariable logSoftmax(SDVariable x, int dimension)
x
- Input (NUMERIC type)dimension
- Dimension along which to apply log softmaxpublic SDVariable logSoftmax(String name, SDVariable x, int dimension)
name
- name May be null. Name for the output variablex
- Input (NUMERIC type)dimension
- Dimension along which to apply log softmaxpublic SDVariable multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
queries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)values
- input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)Wq
- input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wk
- input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wv
- input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)Wo
- output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled
- normalization, false -> do not apply normalization, true -> apply normalizationpublic SDVariable multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
name
- name May be null. Name for the output variablequeries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)values
- input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)Wq
- input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wk
- input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wv
- input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)Wo
- output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled
- normalization, false -> do not apply normalization, true -> apply normalizationpublic SDVariable pad(SDVariable input, SDVariable padding, PadMode PadMode, double constant)
input
- Input tensor (NUMERIC type)padding
- Padding value (NUMERIC type)PadMode
- Padding formatconstant
- Padding constantpublic SDVariable pad(String name, SDVariable input, SDVariable padding, PadMode PadMode, double constant)
name
- name May be null. Name for the output variableinput
- Input tensor (NUMERIC type)padding
- Padding value (NUMERIC type)PadMode
- Padding formatconstant
- Padding constantpublic SDVariable pad(SDVariable input, SDVariable padding, double constant)
input
- Input tensor (NUMERIC type)padding
- Padding value (NUMERIC type)constant
- Padding constantpublic SDVariable pad(String name, SDVariable input, SDVariable padding, double constant)
name
- name May be null. Name for the output variableinput
- Input tensor (NUMERIC type)padding
- Padding value (NUMERIC type)constant
- Padding constantpublic SDVariable preciseGelu(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable preciseGelu(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable prelu(SDVariable input, SDVariable alpha, int... sharedAxes)
input
- Input data (NUMERIC type)alpha
- The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)sharedAxes
- Which axes to share cutoff parameters along. (Size: AtLeast(min=1))public SDVariable prelu(String name, SDVariable input, SDVariable alpha, int... sharedAxes)
name
- name May be null. Name for the output variableinput
- Input data (NUMERIC type)alpha
- The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)sharedAxes
- Which axes to share cutoff parameters along. (Size: AtLeast(min=1))public SDVariable relu(SDVariable x, double cutoff)
x
- Input (NUMERIC type)cutoff
- Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0public SDVariable relu(String name, SDVariable x, double cutoff)
name
- name May be null. Name for the output variablex
- Input (NUMERIC type)cutoff
- Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0public SDVariable relu6(SDVariable x, double cutoff)
x
- Input (NUMERIC type)cutoff
- Cutoff value for ReLU operation. Usually 0public SDVariable relu6(String name, SDVariable x, double cutoff)
name
- name May be null. Name for the output variablex
- Input (NUMERIC type)cutoff
- Cutoff value for ReLU operation. Usually 0public SDVariable reluLayer(SDVariable input, SDVariable weights, SDVariable bias)
input
- Input data (NUMERIC type)weights
- Weights variable (NUMERIC type)bias
- Optional bias variable (may be null) (NUMERIC type)public SDVariable reluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)
name
- name May be null. Name for the output variableinput
- Input data (NUMERIC type)weights
- Weights variable (NUMERIC type)bias
- Optional bias variable (may be null) (NUMERIC type)public SDVariable selu(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable selu(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable sigmoid(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable sigmoid(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable sigmoidDerivative(SDVariable x, SDVariable wrt)
x
- Input Variable (NUMERIC type)wrt
- Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)public SDVariable sigmoidDerivative(String name, SDVariable x, SDVariable wrt)
name
- name May be null. Name for the output variablex
- Input Variable (NUMERIC type)wrt
- Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)public SDVariable softmax(SDVariable x, int dimension)
x
- Input (NUMERIC type)dimension
- Dimension along which to apply softmaxpublic SDVariable softmax(String name, SDVariable x, int dimension)
name
- name May be null. Name for the output variablex
- Input (NUMERIC type)dimension
- Dimension along which to apply softmaxpublic SDVariable softmax(SDVariable x)
x
- Input (NUMERIC type)public SDVariable softmax(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input (NUMERIC type)public SDVariable softmaxDerivative(SDVariable x, SDVariable wrt, int dimension)
x
- Softmax input (NUMERIC type)wrt
- Gradient at output, dL/dx (NUMERIC type)dimension
- Softmax dimensionpublic SDVariable softmaxDerivative(String name, SDVariable x, SDVariable wrt, int dimension)
name
- name May be null. Name for the output variablex
- Softmax input (NUMERIC type)wrt
- Gradient at output, dL/dx (NUMERIC type)dimension
- Softmax dimensionpublic SDVariable softplus(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable softplus(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable softsign(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable softsign(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable softsignDerivative(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable softsignDerivative(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable swish(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable swish(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)public SDVariable tanh(SDVariable x)
x
- Input variable (NUMERIC type)public SDVariable tanh(String name, SDVariable x)
name
- name May be null. Name for the output variablex
- Input variable (NUMERIC type)Copyright © 2020. All rights reserved.