Performs latency and consistency predictions as described in
"Probabilistically Bounded Staleness for Practical Partial Quorums"
by Bailis et al. in VLDB 2012. The predictions are of the form:
With ReplicationFactor N, read consistency level of
R, and write consistency level W, after
t seconds, p% of reads will return a version
within k versions of the last written; this should result
in a latency of L ms.
These predictions should be used as a rough guideline for system
operators. This interface is exposed through nodetool.
The class accomplishes this by measuring latencies for reads and
writes, then using Monte Carlo simulation to predict behavior under
a given N,R, and W based on those latencies.
We capture four distributions:
-
W: time from when the coordinator sends a mutation to the time
that a replica begins to serve the new value(s)
-
A: time from when a replica accepting a mutation sends an
acknowledgment to the time the coordinator hears of it
-
R: time from when the coordinator sends a read request to the time
that the replica performs the read
-
S: time from when the replica sends a read response to the time
when the coordinator receives it
A and
S are mostly network-bound, while W and R
depend on both the network and local processing time.
Caveats:
Prediction is only as good as the latencies collected. Accurate
prediction requires synchronizing clocks between replicas. We
collect a running sample of latencies, but, if latencies change
dramatically, predictions will be off.
The predictions are conservative, or worst-case, meaning we may
predict more staleness than in practice in the following ways:
-
We do not account for read repair.
-
We do not account for Merkle tree exchange.
-
Multi-version staleness is particularly conservative.
-
We simulate non-local reads and writes. We assume that the
coordinating Cassandra node is not itself a replica for a given key.
The predictions are optimistic in the following ways:
-
We do not predict the impact of node failure.
-
We do not model hinted handoff.