Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: RaftNetwork::snapshot() to send a complete snapshot #1009

Merged

Commits on Feb 19, 2024

  1. Feature: RaftNetwork::snapshot() to send a complete snapshot

    Add `RaftNetwork::snapshot()` to send a complete snapshot and move
    sending snapshot by chunks out of ReplicationCore.
    
    To enable a fully customizable implementation of snapshot transmission
    tailored to the application's needs, this commit relocates the
    chunk-by-chunk transmission logic from `ReplicationCore` to a new
    sub mod, `crate::network::stream_snapshot`.
    
    The `stream_snapshot` mod provides a default chunk-based snapshot
    transmission mechanism, which can be overridden by creating a custom
    implementation of the `RaftNetwork::snapshot()` method. As part of this
    commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`.
    Developers may use `stream_snapshot` as a reference when implementing
    their own snapshot transmission strategy.
    
    Snapshot transmission is internally divided into two distinct phases:
    
    1. Upon request for snapshot transmission, `ReplicationCore` initiates a
       new task `RaftNetwork::snapshot()` dedicated to sending a complete
       `Snapshot`. This task should be able to be terminated gracefully by
       subscribing the `cancel` future.
    
    2. Once the snapshot has been fully transmitted by
       `RaftNetwork::snapshot()`, the task signals an event back to
       `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore`
       of the event, allowing it to acknowledge the completion of the
       snapshot transmission.
    
    Other changes:
    
    - `ReplicationCore` has two `RaftNetwork`s, one for log replication and
      heartbeat, the other for snapshot only.
    
    - `ReplicationClosed` becomes a public error for notifying the
      application implemented sender that a snapshot replication is
      canceled.
    
    - `StreamingError` is introduced as a container of errors that may occur
      in application defined snapshot transmission, including local IO
      error, network errors, errors returned by remote peer and `ReplicationClosed`.
    
    - The `SnapshotResponse` type is introduced to differentiate it from the
      `InstallSnapshotResponse`, which is used for chunk-based responses.
    
    ---
    
    - Part of databendlabs#606
    drmingdrmer committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    4098c38 View commit details
    Browse the repository at this point in the history