Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds channel ID label to the Prometheus message_receive_bytes_total and message_send_bytes_total metrics #1078

Merged
merged 4 commits into from
Sep 19, 2023

Conversation

staheri14
Copy link
Contributor

@staheri14 staheri14 commented Sep 5, 2023

Part of #1077

@staheri14 staheri14 self-assigned this Sep 5, 2023
@staheri14 staheri14 changed the title feat: adds chain Id attributed to the Prometheus BW consumption metrics feat: adds channel Id attribute to the Prometheus BW consumption metrics Sep 5, 2023
@staheri14 staheri14 changed the title feat: adds channel Id attribute to the Prometheus BW consumption metrics feat: adds channel ID label to the Prometheus message_receive_bytes_total and message_send_bytes_total metrics Sep 5, 2023
@staheri14 staheri14 marked this pull request as ready for review September 5, 2023 23:12
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Since this PR targets v0.34.x-celestia, should it be forward-ported to main? Asking because I thought the usual workflow was to target main and backport to v0.34.x-celestia per https://github.com/celestiaorg/celestia-core#branches. cc: @cmwaters

[question] how was this PR tested? Did you manually verify the metric labels? Do you mind including that in the PR description?

p2p/metrics.go Show resolved Hide resolved
p2p/peer.go Show resolved Hide resolved
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

this seems like a logical change, although per this comment #1077 (comment) I've been able to do this before without adding an extra index here

@staheri14 perhaps its better to create a branch of celestia-app based on v1.x that is using this branch, and then we all view it in action on grafana by updating our mocha node to use that special version of v1.x

@staheri14
Copy link
Contributor Author

@staheri14 perhaps its better to create a branch of celestia-app based on v1.x that is using this branch, and then we all view it in action on grafana by updating our mocha node to use that special version of v1.x

@evan-forbes Sounds good to me, will do that and update you here.

@staheri14
Copy link
Contributor Author

@staheri14 perhaps its better to create a branch of celestia-app based on v1.x that is using this branch, and then we all view it in action on grafana by updating our mocha node to use that special version of v1.x

@evan-forbes Sounds good to me, will do that and update you here.

Created this branch to test it on our mocha node.
cc: @evan-forbes

@staheri14
Copy link
Contributor Author

After some investigation it looks like that chID may have already been attached to the metrics targeted in this PR. So converting this to draft until we verify this.

@staheri14 staheri14 marked this pull request as draft September 7, 2023 16:44
@staheri14
Copy link
Contributor Author

staheri14 commented Sep 7, 2023

I was able to run a node (based off the lates main branch 1.0.0-rc14 and without the changes in this PR) and connect it to mocha-4 and inspect its metrics as provided below. It illustrates that none of the message_receive_bytes_total and message_send_bytes_total metrics are associated with the channel ID label. In light of this, going to get the PR back to the ready mode.

# TYPE cometbft_p2p_message_receive_bytes_total counter
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="blockchain_BlockResponse",version="1.0.0-rc14"} 394149
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="blockchain_StatusResponse",version="1.0.0-rc14"} 7
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="consensus_HasVote",version="1.0.0-rc14"} 365
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="consensus_NewRoundStep",version="1.0.0-rc14"} 64
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="consensus_NewValidBlock",version="1.0.0-rc14"} 50
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="p2p_PexAddrs",version="1.0.0-rc14"} 15121
# HELP cometbft_p2p_message_send_bytes_total Number of bytes of each message type sent.
# TYPE cometbft_p2p_message_send_bytes_total counter
cometbft_p2p_message_send_bytes_total{chain_id="mocha-4",message_type="blockchain_BlockRequest",version="1.0.0-rc14"} 1900
cometbft_p2p_message_send_bytes_total{chain_id="mocha-4",message_type="blockchain_StatusResponse",version="1.0.0-rc14"} 7
cometbft_p2p_message_send_bytes_total{chain_id="mocha-4",message_type="p2p_PexRequest",version="1.0.0-rc14"} 2

Below is the full list of metrics, in case needed:

# HELP cometbft_consensus_fast_syncing Whether or not a node is fast syncing. 1 if yes, 0 if no.
# TYPE cometbft_consensus_fast_syncing gauge
cometbft_consensus_fast_syncing{chain_id="mocha-4",version="1.0.0-rc14"} 1
# HELP cometbft_mempool_size Size of the mempool (number of uncommitted transactions).
# TYPE cometbft_mempool_size gauge
cometbft_mempool_size{chain_id="mocha-4",version="1.0.0-rc14"} 0
# HELP cometbft_mempool_successful_txs Number of transactions that successfully made it into a block.
# TYPE cometbft_mempool_successful_txs counter
cometbft_mempool_successful_txs{chain_id="mocha-4",version="1.0.0-rc14"} 0
# HELP cometbft_p2p_message_receive_bytes_total Number of bytes of each message type received.
# TYPE cometbft_p2p_message_receive_bytes_total counter
**cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="blockchain_BlockResponse",version="1.0.0-rc14"} 394149**
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="blockchain_StatusResponse",version="1.0.0-rc14"} 7
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="consensus_HasVote",version="1.0.0-rc14"} 365
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="consensus_NewRoundStep",version="1.0.0-rc14"} 64
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="consensus_NewValidBlock",version="1.0.0-rc14"} 50
cometbft_p2p_message_receive_bytes_total{chain_id="mocha-4",message_type="p2p_PexAddrs",version="1.0.0-rc14"} 15121
# HELP cometbft_p2p_message_send_bytes_total Number of bytes of each message type sent.
# TYPE cometbft_p2p_message_send_bytes_total counter
cometbft_p2p_message_send_bytes_total{chain_id="mocha-4",message_type="blockchain_BlockRequest",version="1.0.0-rc14"} 1900
cometbft_p2p_message_send_bytes_total{chain_id="mocha-4",message_type="blockchain_StatusResponse",version="1.0.0-rc14"} 7
cometbft_p2p_message_send_bytes_total{chain_id="mocha-4",message_type="p2p_PexRequest",version="1.0.0-rc14"} 2
# HELP cometbft_p2p_peer_receive_bytes_total Number of bytes received from a given peer.
# TYPE cometbft_p2p_peer_receive_bytes_total counter
cometbft_p2p_peer_receive_bytes_total{chID="0x0",chain_id="mocha-4",peer_id="34499b1ac473fbb03894c883178ecc83f0d6eaf6",version="1.0.0-rc14"} 15121
cometbft_p2p_peer_receive_bytes_total{chID="0x20",chain_id="mocha-4",peer_id="34499b1ac473fbb03894c883178ecc83f0d6eaf6",version="1.0.0-rc14"} 479
cometbft_p2p_peer_receive_bytes_total{chID="0x40",chain_id="mocha-4",peer_id="34499b1ac473fbb03894c883178ecc83f0d6eaf6",version="1.0.0-rc14"} 394156
# HELP cometbft_p2p_peer_send_bytes_total Number of bytes sent to a given peer.
# TYPE cometbft_p2p_peer_send_bytes_total counter
cometbft_p2p_peer_send_bytes_total{chID="0x0",chain_id="mocha-4",peer_id="34499b1ac473fbb03894c883178ecc83f0d6eaf6",version="1.0.0-rc14"} 2
cometbft_p2p_peer_send_bytes_total{chID="0x40",chain_id="mocha-4",peer_id="34499b1ac473fbb03894c883178ecc83f0d6eaf6",version="1.0.0-rc14"} 1907
# HELP cometbft_p2p_peers Number of peers.
# TYPE cometbft_p2p_peers gauge
cometbft_p2p_peers{chain_id="mocha-4",version="1.0.0-rc14"} 1
# HELP cometbft_state_block_processing_time Time between BeginBlock and EndBlock in ms.
# TYPE cometbft_state_block_processing_time histogram
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="1"} 10
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="11"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="21"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="31"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="41"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="51"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="61"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="71"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="81"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="91"} 14
cometbft_state_block_processing_time_bucket{chain_id="mocha-4",version="1.0.0-rc14",le="+Inf"} 14
cometbft_state_block_processing_time_sum{chain_id="mocha-4",version="1.0.0-rc14"} 21.592999999999996
cometbft_state_block_processing_time_count{chain_id="mocha-4",version="1.0.0-rc14"} 14
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.0124e-05
go_gc_duration_seconds{quantile="0.25"} 2.8126e-05
go_gc_duration_seconds{quantile="0.5"} 4.5291e-05
go_gc_duration_seconds{quantile="0.75"} 5.6333e-05
go_gc_duration_seconds{quantile="1"} 0.000195125
go_gc_duration_seconds_sum 0.000890792
go_gc_duration_seconds_count 13
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 671
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.21.0"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.6826864e+08
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 6.45035184e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.498264e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 1.80352e+06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.033076e+07
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.6826864e+08
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.2939264e+08
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.76922624e+08
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 1.529264e+06
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 5.398528e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.06315264e+08
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.694115731110122e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 3.332784e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9600
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 1.42548e+06
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 1.434048e+06
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 2.24743112e+08
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.61164e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 4.063232e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 4.063232e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 3.25268808e+08
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 14
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

@staheri14 staheri14 marked this pull request as ready for review September 7, 2023 19:49
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the chID doesn't appear in the metrics w/o this change, but did we ever figure out why we are able to index by channel w/o it?

@@ -532,7 +534,8 @@ func createMConnection(
}
}
p.metrics.PeerReceiveBytesTotal.With(labels...).Add(float64(len(msgBytes)))
p.metrics.MessageReceiveBytesTotal.With("message_type", p.mlc.ValueToMetricLabel(msg)).Add(float64(len(msgBytes)))
p.metrics.MessageReceiveBytesTotal.With("message_type",
p.mlc.ValueToMetricLabel(msg), "chID", fmt.Sprintf("%#x", chID)).Add(float64(len(msgBytes)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no changes needed]
using fmt.Sprintf in a heavy path is something to keep an eye on in the future

tendermint/tendermint#8845 (comment)

@staheri14
Copy link
Contributor Author

did we ever figure out why we are able to index by channel w/o it?

Can you please elaborate on this? When were we able to do index by channel id? And for which metrics?

@evan-forbes
Copy link
Member

evan-forbes commented Sep 7, 2023

ahhh I'm dumb ok this is from the peer_send_bytes_total and peer_receive_bytes_total, which is indexed by channel.

Screenshot from 2023-09-06 18-38-59

would that give us the data that we need? I beleive this is what we have used in the past to find total data per channel

@staheri14
Copy link
Contributor Author

If we would like to have more granularity on the bw consumption per Peer, we can use peer_send_bytes_total and peer_receive_bytes_total. I'll see if we can further agreegate those.
Nevertheless, given that we now have message_receive_bytes_total and message_send_bytes_total metrics with channel ID labels available, I propose to proceed as follows:

  • Compute per peer data transmission rate using peer_send_bytes_total and peer_receive_bytes_total
  • Compute total data transmission rate using message_receive_bytes_total and message_send_bytes_total.
  • We can additionally and optionally cross check the two (the aggregate of the data transmission rate of all the peers should be equal to the the data transmission rate of message_receive_bytes_total and message_send_bytes_total) .

If there is no opposing opinion, I am going to start making queries and gathering the results.
cc: @evan-forbes

Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm!

@staheri14
Copy link
Contributor Author

@evan-forbes Shall we merge this (or do we want to keep it open for future metrics that we may come up with)? the channel ID proved useful in our last traffic rate report using Prometheus metrics. I am more inclined toward merging it, wdyt?

@staheri14
Copy link
Contributor Author

Got @evan-forbes confirmation in a sync call, so going to merge it.

@staheri14 staheri14 merged commit 38e13ff into v0.34.x-celestia Sep 19, 2023
18 checks passed
@staheri14 staheri14 deleted the sanaz/trace-channelid-per-send-and-rec branch September 19, 2023 19:36
@faddat faddat mentioned this pull request Feb 22, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants