Class MySqlReadOnlyIncrementalSnapshotChangeEventSource<T extends DataCollectionId>

  • All Implemented Interfaces:
    IncrementalSnapshotChangeEventSource<T>

    public class MySqlReadOnlyIncrementalSnapshotChangeEventSource<T extends DataCollectionId>
    extends AbstractIncrementalSnapshotChangeEventSource<T>
    A MySQL specific read-only incremental snapshot change event source. Uses executed GTID set as low/high watermarks for incremental snapshot window to support read-only connection.

    Prerequisites

    • gtid_mode=ON
    • enforce_gtid_consistency=ON
    • If the connector is reading from a replica, then for multithreaded replicas (replicas on which replica_parallel_workers is set to a value greater than 0) it’s required to set replica_preserve_commit_order=1 or slave_preserve_commit_order=1

    When a chunk should be snapshotted

    • streaming is paused (this is implicit when the watermarks are handled)
    • a SHOW MASTER STATUS query is executed and the low watermark is set to executed_gtid_set
    • a new data chunk is read from a database by generating the SELECT statement and placed into a window buffer keyed by primary keys
    • a SHOW MASTER STATUS query is executed and the high watermark is set to executed_gtid_set from SHOW MASTER STATUS subtract low watermark. In case the high watermark contains more than one unique server UUID value, steps 2 - 4 get redone
    • streaming is resumed

    During the subsequent streaming

    • if binlog event is received and its GTID is outside of the low watermark GTID set then window processing mode is enabled
    • if binlog event is received and its GTID is outside of the high watermark GTID set then window processing mode is disabled and the rest of the window’s buffer is streamed
    • if server heartbeat event is received and its GTID reached the largest transaction id of high watermark then window processing mode is disabled and the rest of the window’s buffer is streamed
    • if window processing mode is enabled then if the event key is contained in the window buffer then it is removed from the window buffer
    • event is streamed


    Watermark checks

    If a watermark's GTID set doesn’t contain a binlog event’s GTID then the watermark is passed and the window processing mode gets updated. Multiple binlog events can have the same GTID, this is why the algorithm waits for the binlog event with GTID outside of watermark’s GTID set to close the window, instead of closing it as soon as the largest transaction id is reached.

    The deduplication starts with the first event after the low watermark because up to the point when GTID is contained in the low watermark (executed_gtid_set that was captured before the chunk select statement). A COMMIT after the low watermark is used to make sure a chunk selection sees the changes that are committed before its execution.

    The deduplication continues for all the events that are in the high watermark. The deduplicated chunk events are inserted right before the first event that is outside of the high watermark.


    No binlog events

    Server heartbeat events (events that are sent by a primary to a replica to let the replica know that the primary is still alive) are used to update the window processing mode when the rate of binlog updates is low. Server heartbeat is sent only if there are no binlog events for the duration of a heartbeat interval.

    The heartbeat has the same GTID as the latest binlog event at the moment (it’s a technical event that doesn’t get written into the output stream, but can be used in events processing logic). In case there are zero updates after the chunk selection, the server heartbeat’s GTID will be within a high watermark. This is why for server heartbeat event’s GTID it’s enough to reach the largest transaction id of a high watermark to disable the window processing mode, send a chunk and proceed to the next one.

    The server UUID part of heartbeat’s GTID is used to get the max transaction id of a high watermark for the same server UUID. High watermark is set to a difference between executed_gtid_set before and after chunk selection. If a high watermark contains more than one unique server UUID the chunk selection is redone and watermarks are recaptured. This is done to avoid the scenario when the window is closed too early by heartbeat because server UUID changes between high and low watermarks. Heartbeat doesn’t need to check the window processing mode, it doesn’t affect correctness and simplifies the checks for the cases when the binlog reader was up to date with the low watermark and when there are no new events between high and low watermarks.


    No changes between watermarks

    A window can be opened and closed right away by the same event. This can happen when a high watermark is an empty set, which means there were no binlog events during the chunk select. Chunk will get inserted right after the low watermark, no events will be deduplicated from the chunk


    No updates for included tables

    It’s important to receive binlog events for the incremental snapshot to make progress. All binlog events are checked against the low and high watermarks, including the events from the tables that aren’t included in the connector. This guarantees that the window processing mode gets updated even when none of the tables included in the connector are getting binlog events.