org.platanios.tensorflow.api.ops.rnn.attention
Memory to query; usually the output of an RNN encoder. Each tensor in the memory
should be shaped [batchSize, maxTime, ...]
.
Weights tensor with which the memory is multiplied to produce the attention keys.
Sequence lengths for the batch entries in the memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Scalar tensor with which the scores are multiplied before used to compute attention probabilities.
Optional function that converts computed scores to probabilities. Defaults to the softmax function. A potentially useful alternative is the hardmax function.
Mask value to use for the score before passing it to probabilityFn
. Defaults to
negative infinity. Note that this value is only used if memorySequenceLengths
is not
null
.
Name prefix to use for all created ops.
Computes an alignment tensor given the provided query and previous alignment tensor.
Computes an alignment tensor given the provided query and previous alignment tensor.
The previous alignment tensor is important for attention mechanisms that use the previous alignment to calculate the attention at the next time step, such as monotonic attention mechanisms.
TODO: Figure out how to generalize the "next state" functionality.
Query tensor.
Previous alignment tensor.
Tuple containing the alignment tensor and the next attention state.
Initial alignment value.
Initial alignment value.
This is important for attention mechanisms that use the previous alignment to calculate the alignment at the next time step (e.g., monotonic attention).
The default behavior is to return a tensor of all zeros.
Initial state value.
Initial state value.
This is important for attention mechanisms that use the previous alignment to calculate the alignment at the next time step (e.g., monotonic attention).
The default behavior is to return the same output as initialAlignment
.
Memory to query; usually the output of an RNN encoder.
Memory to query; usually the output of an RNN encoder. Each tensor in the memory
should be shaped [batchSize, maxTime, ...]
.
Sequence lengths for the batch entries in the memory.
Sequence lengths for the batch entries in the memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Weights tensor with which the memory is multiplied to produce the attention keys.
Weights tensor with which the memory is multiplied to produce the attention keys.
Name prefix to use for all created ops.
Name prefix to use for all created ops.
Computes alignment probabilities for score
.
Computes alignment probabilities for score
.
Alignment score tensor.
Alignment probabilities tensor.
Optional function that converts computed scores to probabilities.
Optional function that converts computed scores to probabilities. Defaults to the softmax function. A potentially useful alternative is the hardmax function.
Scalar tensor with which the scores are multiplied before used to compute attention probabilities.
Scalar tensor with which the scores are multiplied before used to compute attention probabilities.
Computes an alignment score for query
.
Computes an alignment score for query
.
Query tensor.
Score tensor.
Mask value to use for the score before passing it to probabilityFn
.
Mask value to use for the score before passing it to probabilityFn
. Defaults to
negative infinity. Note that this value is only used if memorySequenceLengths
is not
null
.
Luong-style (multiplicative) attention scoring.
This attention has two forms. The first is standard Luong attention, as described in: ["Effective Approaches to Attention-based Neural Machine Translation.", EMNLP 2015](https://arxiv.org/abs/1508.04025).
The second is the scaled form inspired partly by the normalized form of Bahdanau attention. To enable the second form, construct the object with
weightsScale
set to the value of a scalar scaling variable.