We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frugalos 0.10.2 https://github.com/frugalos/frugalos/releases/tag/0.10.2 を用いて比較的規模の大きなクラスタを運用している状況で、あるfrugalosサーバを再起動(stopした後にstartする)した際に、以下の104行から107行でのassertに失敗し、raft nodeが停止した:
raftlog/src/log/history.rs
Lines 102 to 110 in c0bd2d8
注意 frugalos_raft が関連する話であり、調査内容によってはfrugalos側のissueに移動する可能性がある。
frugalos_raft
raftlog/src/node_state/loader.rs
Lines 48 to 71 in c0bd2d8
common.handle_log_appended(&suffix)
CRIT Node down: Other (cause; assertion failed: `self.last_record().head.prev_term < tail.prev_term`) HISTORY: [0] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/log/history.rs:104 [1] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/common/mod.rs:78 [2] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/loader.rs:58 [3] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/mod.rs:113 [4] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/replicated_log.rs:260 -- node=Node { id: NodeId("b503.ba@*.*.*.*:14278"), role: Follower, ballot: Ballot { term: Term(315), voted_for: NodeId("b501.120@*.*.*.*:14278") } } [5] at frugalos_mds/src/node/node.rs:730 [6] at frugalos_segment/src/service.rs:265 [7] at frugalos_segment/src/service.rs:280 , node: b503, module: frugalos_segment::service:282
INFO Add a node: bucket_no=0; segment_no=181; device_no=186; device_id=DeviceId("frugalos15_05"); node=NodeId { local_id: Loc alNodeId("b503"), instance: 186, addr: V4(*.*.*.*:14278) }; , module: frugalos::service:219 INFO [START] LoadBallot: lump_id=LumpId("0000000000b503000000000000000000"); , node: b503, module: frugalos_raft::storage::ballot:21 INFO [FINISH] LoadBallot: ballot=Some(Ballot { term: Term(315), voted_for: NodeId("b501.120@*.*.*.*:14278") }); , node:b503, module: frugalos_raft::storage::ballot:42 INFO [START] LoadLogPrefixIndex: lump_id=LumpId("0000000000b503020000000000000000"); , node: b503, module: frugalos_raft::storage::log_prefix::load:87 INFO [START] LoadLogPrefix, node: b503, module: frugalos_raft::storage::log_prefix::load:23 INFO [START] LoadLogSuffix: head=LogPosition { prev_term: Term(0), index: LogIndex(0) }; , node: b503, module: frugalos_raft: INFO New raft election term: ballot=Ballot { term: Term(315), voted_for: NodeId("b501.120@*.*.*.*:14278") }, node: b503, module: frugalos_mds::node::node:442 INFO [FINISH] LoadLogPrefixIndex: index=Some(2840..2841); , node: b503, module: frugalos_raft::storage::log_prefix::load:113 INFO [START] LoadLogPrefixBytes: prefix_index=2840..2841; , node: b503, module: frugalos_raft::storage::log_prefix::load:135 INFO [PROGRESS] LoadLogPrefixBytes: index=2840; lump_id=LumpId("0000000000b503030000000000000b18"); , node: b503, module: frugalos_raft::storage::log_prefix::load:175 INFO [FINISH] LoadLogPrefixBytes: bytes.len()=1617414; , node: b503, module: frugalos_raft::storage::log_prefix::load:165 INFO [FINISH] LoadLogPrefix: prefix.tail=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }; prefix.config=Clust erConfig { new: {...}, old: {}, state: Sta ble }; bytes.len()=1617414; , node: b503, module: frugalos_raft::storage::log_prefix::load:58 INFO Event::LogPrefixUpdated: self.log_suffix.head=LogPosition { prev_term: Term(0), index: LogIndex(0) }; new_head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }; , node: b503, module: frugalos_raft::storage:222 INFO [START] LoadLogSuffix: head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }; , node: b503, module: frugalos_raft::storage::log_suffix:25 INFO New snapshot is loaded: new_head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }, bytes=1617139, node: b503, module: frugalos_mds::node::node:450 INFO Snapshot decoded: 1617139 bytes, node: b503, module: frugalos_mds::node::node:460 INFO Snapshot decoded: new_head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }, node: b503, module: frugalos_mds::node::node:693 INFO [FINISH] LoadLogSuffix: self.current=LogIndex(28405504); , node: b503, module: frugalos_raft::storage::log_suffix:48
Lines 95 to 101 in c0bd2d8
self.last_record().head.prev_term == tail.prev_term
The text was updated successfully, but these errors were encountered:
snapshot以降のログ取得についてはLoadLogSuffix構造体を用いて行われる。 ログの取得については、LoadLogEntryというサブ構造体を用いて、ログに対応するlump_idをインクリメントしつつデバイスがnoを返すまで取得を試みるという戦略を用いている: https://github.com/frugalos/frugalos/blob/2baf5d0e482f062a90f2af6245d6e27775987c30/frugalos_raft/src/storage/log_suffix.rs#L62-L107
LoadLogSuffix
LoadLogEntry
一方で、cannylsにはjournal memory bufferが存在するため、メモリ上の最新ジャーナル情報がディスクに書き込まれる前にプロセスが終了し、再起動した場合には、上のようにBallotの持つTermとLogの持つTermでズレが生じるかもしれない。ただしズレが生じたとして何か問題はあるのか?
Sorry, something went wrong.
yuezato
No branches or pull requests
状況
Frugalos 0.10.2 https://github.com/frugalos/frugalos/releases/tag/0.10.2 を用いて比較的規模の大きなクラスタを運用している状況で、あるfrugalosサーバを再起動(stopした後にstartする)した際に、以下の104行から107行でのassertに失敗し、raft nodeが停止した:
raftlog/src/log/history.rs
Lines 102 to 110 in c0bd2d8
注意
frugalos_raft
が関連する話であり、調査内容によってはfrugalos側のissueに移動する可能性がある。計算フロー
raftlog/src/node_state/loader.rs
Lines 48 to 71 in c0bd2d8
Loaderでログ末尾まで読み込み、
common.handle_log_appended(&suffix)
を呼び出してその処理中で問題が起こっている。当該ノードが直前まで行っていた計算の流れ
疑問
raftlog/src/log/history.rs
Lines 95 to 101 in c0bd2d8
ここが実行されてしまうと、
self.last_record().head.prev_term == tail.prev_term
が成立しないか?The text was updated successfully, but these errors were encountered: