Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: RaftNetwork::snapshot() to send a complete snapshot #1009

Merged

Conversation

drmingdrmer
Copy link
Member

@drmingdrmer drmingdrmer commented Feb 16, 2024

Changelog

Feature: RaftNetwork::snapshot() to send a complete snapshot

Add RaftNetwork::snapshot() to send a complete snapshot and move
sending snapshot by chunks out of ReplicationCore.

To enable a fully customizable implementation of snapshot transmission
tailored to the application's needs, this commit relocates the
chunk-by-chunk transmission logic from ReplicationCore to a new
sub mod, crate::network::stream_snapshot.

The stream_snapshot mod provides a default chunk-based snapshot
transmission mechanism, which can be overridden by creating a custom
implementation of the RaftNetwork::snapshot() method. As part of this
commit, RaftNetwork::snapshot() simply delegates to stream_snapshot.
Developers may use stream_snapshot as a reference when implementing
their own snapshot transmission strategy.

Snapshot transmission is internally divided into two distinct phases:

  1. Upon request for snapshot transmission, ReplicationCore initiates a
    new task RaftNetwork::snapshot() dedicated to sending a complete
    Snapshot. This task should be able to be terminated gracefully by
    subscribing the cancel future.

  2. Once the snapshot has been fully transmitted by
    RaftNetwork::snapshot(), the task signals an event back to
    ReplicationCore. Subsequently, ReplicationCore informs RaftCore
    of the event, allowing it to acknowledge the completion of the
    snapshot transmission.

Other changes:

  • ReplicationCore has two RaftNetworks, one for log replication and
    heartbeat, the other for snapshot only.

  • ReplicationClosed becomes a public error for notifying the
    application implemented sender that a snapshot replication is
    canceled.

  • StreamingError is introduced as a container of errors that may occur
    in application defined snapshot transmission, including local IO
    error, network errors, errors returned by remote peer and ReplicationClosed.

  • The SnapshotResponse type is introduced to differentiate it from the
    InstallSnapshotResponse, which is used for chunk-based responses.



This change is Reviewable

@drmingdrmer drmingdrmer changed the title Refactor: move sending snapshot by chunks out of ReplicationCore Feature: RaftNetwork::snapshot() to send a complete snapshot Feb 16, 2024
Add `RaftNetwork::snapshot()` to send a complete snapshot and move
sending snapshot by chunks out of ReplicationCore.

To enable a fully customizable implementation of snapshot transmission
tailored to the application's needs, this commit relocates the
chunk-by-chunk transmission logic from `ReplicationCore` to a new
sub mod, `crate::network::stream_snapshot`.

The `stream_snapshot` mod provides a default chunk-based snapshot
transmission mechanism, which can be overridden by creating a custom
implementation of the `RaftNetwork::snapshot()` method. As part of this
commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`.
Developers may use `stream_snapshot` as a reference when implementing
their own snapshot transmission strategy.

Snapshot transmission is internally divided into two distinct phases:

1. Upon request for snapshot transmission, `ReplicationCore` initiates a
   new task `RaftNetwork::snapshot()` dedicated to sending a complete
   `Snapshot`. This task should be able to be terminated gracefully by
   subscribing the `cancel` future.

2. Once the snapshot has been fully transmitted by
   `RaftNetwork::snapshot()`, the task signals an event back to
   `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore`
   of the event, allowing it to acknowledge the completion of the
   snapshot transmission.

Other changes:

- `ReplicationCore` has two `RaftNetwork`s, one for log replication and
  heartbeat, the other for snapshot only.

- `ReplicationClosed` becomes a public error for notifying the
  application implemented sender that a snapshot replication is
  canceled.

- `StreamingError` is introduced as a container of errors that may occur
  in application defined snapshot transmission, including local IO
  error, network errors, errors returned by remote peer and `ReplicationClosed`.

- The `SnapshotResponse` type is introduced to differentiate it from the
  `InstallSnapshotResponse`, which is used for chunk-based responses.

---

- Part of databendlabs#606
@drmingdrmer drmingdrmer merged commit 687fcf2 into databendlabs:main Feb 19, 2024
25 of 27 checks passed
@drmingdrmer drmingdrmer deleted the 33-transmit-complete-snapshot branch February 19, 2024 06:38
@drmingdrmer drmingdrmer mentioned this pull request Feb 19, 2024
3 tasks
@drmingdrmer drmingdrmer mentioned this pull request Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant