Skip to content

Fluffy: Only send offers to peers when content falls within radius #3242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 6, 2025

Conversation

bhartnett
Copy link
Contributor

@bhartnett bhartnett commented Apr 28, 2025

Changes in this PR:

  • Improve gossip so that we only send offers that are within the peer's radius. Previously when doing a network lookup we were not checking the radius. This reduces the wasted effort/resources used on sending offers to nodes which will always decline the offer because the content is outside their radius.
  • In order to check the the peer's radius, when doing a nodes lookup, we now ping each peer if we don't have their radius in the cache. This change significantly improves bridge performance (when node lookup is enabled) because we fill up the radius cache faster and overall reduce the number of node lookups required when gossiping content into the network.
  • Increase the size of the radius cache. This improves the performance of neighborhoodGossip by caching more of the network in memory.
  • Makes nodes lookup configurable in neighborhoodGossip. Nodes lookup is disabled by default and only enabled when called via the JSON-RPC API. This update is related to the recent spec change here: Move Neighborhood Gossip's RecursiveFindNodes requirement to portal_*PutContent ethereum/portal-network-specs#394. I've added this into this PR to reduce the impact of the node lookup changes on the nodes in the network as they won't be using the node lookup at all unless being called via the JSON-RPC API.

@bhartnett bhartnett changed the title Fluffy: Only send offers to peers when the content fall within their radius Fluffy: Only send offers to peers when the content falls within their radius Apr 28, 2025
@bhartnett bhartnett requested a review from kdeme April 28, 2025 15:13
@bhartnett bhartnett changed the title Fluffy: Only send offers to peers when the content falls within their radius Fluffy: Only send offers to peers when content falls within radius Apr 28, 2025
@bhartnett
Copy link
Contributor Author

I've found that this change significantly improves the gossip performance. Before this change the majority of gossip requests require a nodes lookup but after this change the majority don't require the nodes lookup.

Some stats collected from running the Fluffy state bridge on mainnet for around 15 mins:

# HELP portal_gossip_with_lookup Portal wire protocol neighborhood gossip that required a node lookup
# TYPE portal_gossip_with_lookup counter
portal_gossip_with_lookup_total{protocol_id="500a"} 1429.0
portal_gossip_with_lookup_created{protocol_id="500a"} 1745901457.0

# HELP portal_gossip_without_lookup Portal wire protocol neighborhood gossip that did not require a node lookup
# TYPE portal_gossip_without_lookup counter
portal_gossip_without_lookup_total{protocol_id="500a"} 28654.0
portal_gossip_without_lookup_created{protocol_id="500a"} 1745901457.0

Also (almost) no DeclinedNotWithinRadius codes being returned:

# HELP portal_offer_accept_codes Portal wire protocol accept codes received from peers after sending offers
# TYPE portal_offer_accept_codes counter
portal_offer_accept_codes_total{protocol_id="500a",accept_code="3"} 1.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="3"} 1745901866.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="2"} 114375.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="2"} 1745901458.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="0"} 4073.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="0"} 1745901488.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="1"} 345.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="1"} 1745901465.0

Copy link
Contributor

@kdeme kdeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that the improvement on having less lookups comes from the increase in the radius cache? As I don't really see any other changes that could have to due with this?

And the improvement in less DeclinedNotWithinRadius returned codes comes from the fact that you ping and check the radius.

Comment on lines +1779 to +1783
if p.radiusCache.get(node.id).isNone():
# Send ping to add the node to the radius cache
(await p.ping(node)).isOkOr:
continue

Copy link
Contributor

@kdeme kdeme Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you saw this but there was recently discussion and resulting spec change about removing the additional lookup in the NHgossip. I realized that that part was there mostly (only..?) for initial gossip.

So my plan was to remove this completely here, but keep it for the "putContent" json-rpc call. E.g. have a much simpler neighborhoodGossip call, and the a putIntoNetwork() or similar that does this more complex version with extra lookup.

The additional ping request here is something that thus definitely would not remain in the simpler call. I'm not fully sure if we could keep in the more complex call either though, as from my previous measurements the lookups were not really done often and it will add a delay on the total time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you saw this but there was recently discussion and resulting spec change about removing the additional lookup in the NHgossip. I realized that that part was there mostly (only..?) for initial gossip.

So my plan was to remove this completely here, but keep it for the "putContent" json-rpc call. E.g. have a much simpler neighborhoodGossip call, and the a putIntoNetwork() or similar that does this more complex version with extra lookup.

That idea sounds reasonable to me. Before doing that change, we should probably check that Fluffy can quickly populate the full routing table without too much delay at startup. I didn't really look into this part.

The additional ping request here is something that thus definitely would not remain in the simpler call. I'm not fully sure if we could keep in the more complex call either though, as from my previous measurements the lookups were not really done often and it will add a delay on the total time.

Actually I agree that removing the node lookup is the right direction and perhaps we should focus more on filling the routing table, even making the routing table larger to hold more of the network if possible somehow. I watched a talk from an ipfs developer who said one of the optimizations they do is create a routing table that holds the entire network (the ipfs network is very large) and if I remember correctly that part only uses a few MBs of memory in total. This design greatly improves performance because you have most/all of the nodes in the network locally.

Having said that I would say that this change is still an improvement in the short term. I did notice quiet a decent speed up when running the state bridge locally.

@bhartnett
Copy link
Contributor Author

I'm assuming that the improvement on having less lookups comes from the increase in the radius cache? As I don't really see any other changes that could have to due with this?

Yes increasing the size of the radius cache did help a bit but I noticed a bigger improvement from adding the ping which forces the node to be cached in the radius cache so that any future neighborhoodGossip calls can use the cached value without doing the node lookup. So the overall result is we end up doing less node lookups in total across all neighborhoodGossip calls, not specifically the first request to a particular node.

And the improvement in less DeclinedNotWithinRadius returned codes comes from the fact that you ping and check the radius.

Yes, that's right. This was the main reason for this change as it seems wasteful to send offers when the content is outside the radius.

@bhartnett
Copy link
Contributor Author

bhartnett commented Apr 30, 2025

I did some more testing for this change comparing the gossip performance before and after.

In each case, I ran a single fluffy node, gave it two minutes to start up and build up it's routing table. The ran the state bridge with 10 workers for approx 5 minutes.

Here are the results from running Fluffy built from the master branch (before changes):
Logs:

INF 2025-04-30 12:50:55.516+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1020 blockDataQueueLen=1000
INF 2025-04-30 12:50:55.516+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=19 blockOffersQueueLen=1000
INF 2025-04-30 12:51:25.517+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1038 blockDataQueueLen=1000
INF 2025-04-30 12:51:25.517+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=37 blockOffersQueueLen=1000
INF 2025-04-30 12:51:55.518+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1051 blockDataQueueLen=1000
INF 2025-04-30 12:51:55.518+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=50 blockOffersQueueLen=1000
INF 2025-04-30 12:52:25.519+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1072 blockDataQueueLen=1000
INF 2025-04-30 12:52:25.519+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=71 blockOffersQueueLen=1000
INF 2025-04-30 12:52:55.521+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1089 blockDataQueueLen=1000
INF 2025-04-30 12:52:55.521+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=88 blockOffersQueueLen=1000
INF 2025-04-30 12:53:25.522+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1112 blockDataQueueLen=1000
INF 2025-04-30 12:53:25.522+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=111 blockOffersQueueLen=1000
INF 2025-04-30 12:53:55.524+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1129 blockDataQueueLen=1000
INF 2025-04-30 12:53:55.524+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=128 blockOffersQueueLen=1000
INF 2025-04-30 12:54:25.526+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1154 blockDataQueueLen=1000
INF 2025-04-30 12:54:25.526+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=153 blockOffersQueueLen=1000
INF 2025-04-30 12:54:55.527+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1180 blockDataQueueLen=1000
INF 2025-04-30 12:54:55.527+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=179 blockOffersQueueLen=1000
INF 2025-04-30 12:55:25.528+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1208 blockDataQueueLen=1000
INF 2025-04-30 12:55:25.528+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=207 blockOffersQueueLen=1000
INF 2025-04-30 12:55:55.528+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1235 blockDataQueueLen=1000
INF 2025-04-30 12:55:55.528+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=234 blockOffersQueueLen=1000
INF 2025-04-30 12:56:25.530+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1263 blockDataQueueLen=1000
INF 2025-04-30 12:56:25.530+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=262 blockOffersQueueLen=1000
INF 2025-04-30 12:56:55.533+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1293 blockDataQueueLen=1000
INF 2025-04-30 12:56:55.533+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=292 blockOffersQueueLen=1000
INF 2025-04-30 12:57:25.534+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1327 blockDataQueueLen=1000
INF 2025-04-30 12:57:25.534+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=326 blockOffersQueueLen=1000

Metrics:

# HELP portal_offer_accept_codes Portal wire protocol accept codes received from peers after sending offers
# TYPE portal_offer_accept_codes counter
portal_offer_accept_codes_total{protocol_id="500a",accept_code="3"} 489.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="3"} 1745988634.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="2"} 3464.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="2"} 1745988625.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="0"} 54.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="0"} 1745988649.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="1"} 122.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="1"} 1745988634.0

# HELP portal_gossip_offers_successful Portal wire protocol successful content offers from neighborhood gossip
# TYPE portal_gossip_offers_successful counter
portal_gossip_offers_successful_total{protocol_id="500a"} 4104.0
portal_gossip_offers_successful_created{protocol_id="500a"} 1745988625.0

# HELP portal_gossip_offers_failed Portal wire protocol failed content offers from neighborhood gossip
# TYPE portal_gossip_offers_failed counter
portal_gossip_offers_failed_total{protocol_id="500a"} 94.0
portal_gossip_offers_failed_created{protocol_id="500a"} 1745988629.0

# HELP portal_gossip_with_lookup Portal wire protocol neighborhood gossip that required a node lookup
# TYPE portal_gossip_with_lookup counter
portal_gossip_with_lookup_total{protocol_id="500a"} 783.0
portal_gossip_with_lookup_created{protocol_id="500a"} 1745988625.0

# HELP portal_gossip_without_lookup Portal wire protocol neighborhood gossip that did not require a node lookup
# TYPE portal_gossip_without_lookup counter
portal_gossip_without_lookup_total{protocol_id="500a"} 1316.0
portal_gossip_without_lookup_created{protocol_id="500a"} 1745988625.0

Here are the results from running Fluffy built from this branch (after changes):
Logs:

INF 2025-04-30 13:07:45.888+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1020 blockDataQueueLen=1000
INF 2025-04-30 13:07:45.888+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=19 blockOffersQueueLen=1000
INF 2025-04-30 13:08:15.888+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=1632 blockDataQueueLen=1000
INF 2025-04-30 13:08:15.888+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=631 blockOffersQueueLen=1000
INF 2025-04-30 13:08:25.461+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=1000 offerCount=5 workerId=9
INF 2025-04-30 13:08:45.889+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=2788 blockDataQueueLen=1000
INF 2025-04-30 13:08:45.889+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=1787 blockOffersQueueLen=1000
INF 2025-04-30 13:08:49.716+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=2000 offerCount=5 workerId=10
INF 2025-04-30 13:09:15.889+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=3965 blockDataQueueLen=1000
INF 2025-04-30 13:09:15.890+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=2964 blockOffersQueueLen=1000
INF 2025-04-30 13:09:16.625+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=3000 offerCount=6 workerId=2
INF 2025-04-30 13:09:39.022+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=4000 offerCount=8 workerId=9
INF 2025-04-30 13:09:45.890+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=5413 blockDataQueueLen=1000
INF 2025-04-30 13:09:45.890+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=4412 blockOffersQueueLen=1000
INF 2025-04-30 13:09:54.838+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=5000 offerCount=5 workerId=3
INF 2025-04-30 13:10:09.102+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=6000 offerCount=6 workerId=2
INF 2025-04-30 13:10:15.891+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=7503 blockDataQueueLen=1000
INF 2025-04-30 13:10:15.891+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=6502 blockOffersQueueLen=1000
INF 2025-04-30 13:10:22.961+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=7000 offerCount=5 workerId=3
INF 2025-04-30 13:10:37.982+08:00 Collecting block data for block number:    topics="portal_bridge" blockNumber=10000
INF 2025-04-30 13:10:38.161+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=8000 offerCount=5 workerId=1
INF 2025-04-30 13:10:45.892+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=9511 blockDataQueueLen=1000
INF 2025-04-30 13:10:45.892+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=8510 blockOffersQueueLen=1000
INF 2025-04-30 13:10:53.266+08:00 Building state for block number:           topics="portal_bridge" blockNumber=10000
INF 2025-04-30 13:10:53.374+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=9000 offerCount=5 workerId=5
INF 2025-04-30 13:11:09.572+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=10000 offerCount=5 workerId=6
INF 2025-04-30 13:11:15.893+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=11173 blockDataQueueLen=1000
INF 2025-04-30 13:11:15.893+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=10172 blockOffersQueueLen=1000
INF 2025-04-30 13:11:29.500+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=11000 offerCount=6 workerId=4
INF 2025-04-30 13:11:44.448+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=12000 offerCount=5 workerId=3
INF 2025-04-30 13:11:45.893+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=13096 blockDataQueueLen=1000
INF 2025-04-30 13:11:45.893+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=12095 blockOffersQueueLen=1000
INF 2025-04-30 13:11:59.439+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=13000 offerCount=5 workerId=6
INF 2025-04-30 13:12:15.897+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=14647 blockDataQueueLen=1000
INF 2025-04-30 13:12:15.897+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=13646 blockOffersQueueLen=1000
INF 2025-04-30 13:12:21.373+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=14000 offerCount=5 workerId=9
INF 2025-04-30 13:12:35.630+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=15000 offerCount=4 workerId=6
INF 2025-04-30 13:12:45.897+08:00 Block data queue metrics:                  topics="portal_bridge" nextBlockNumber=16701 blockDataQueueLen=1000
INF 2025-04-30 13:12:45.897+08:00 Block offers queue metrics:                topics="portal_bridge" nextBlockNumber=15700 blockOffersQueueLen=1000
INF 2025-04-30 13:12:50.818+08:00 Finished gossiping offers for block:       topics="portal_bridge" blockNumber=16000 offerCount=5 workerId=7

Metrics:

# HELP portal_offer_accept_codes Portal wire protocol accept codes received from peers after sending offers
# TYPE portal_offer_accept_codes counter
portal_offer_accept_codes_total{protocol_id="500a",accept_code="2"} 202181.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="2"} 1745989636.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="0"} 4345.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="0"} 1745989636.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="4"} 1897.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="4"} 1745989689.0
portal_offer_accept_codes_total{protocol_id="500a",accept_code="1"} 360.0
portal_offer_accept_codes_created{protocol_id="500a",accept_code="1"} 1745989653.0

# HELP portal_gossip_offers_successful Portal wire protocol successful content offers from neighborhood gossip
# TYPE portal_gossip_offers_successful counter
portal_gossip_offers_successful_total{protocol_id="500a"} 207832.0
portal_gossip_offers_successful_created{protocol_id="500a"} 1745989636.0

# HELP portal_gossip_offers_failed Portal wire protocol failed content offers from neighborhood gossip
# TYPE portal_gossip_offers_failed counter
portal_gossip_offers_failed_total{protocol_id="500a"} 1164.0
portal_gossip_offers_failed_created{protocol_id="500a"} 1745989640.0

# HELP portal_gossip_with_lookup Portal wire protocol neighborhood gossip that required a node lookup
# TYPE portal_gossip_with_lookup counter
portal_gossip_with_lookup_total{protocol_id="500a"} 160.0
portal_gossip_with_lookup_created{protocol_id="500a"} 1745989635.0

# HELP portal_gossip_without_lookup Portal wire protocol neighborhood gossip that did not require a node lookup
# TYPE portal_gossip_without_lookup counter
portal_gossip_without_lookup_total{protocol_id="500a"} 104367.0
portal_gossip_without_lookup_created{protocol_id="500a"} 1745989636.0

The performance improvement is significant and is likely related to the ping forcing the radius to get cached and the increase in cache size. It doesn't appear to be related to the efficiency gains from only sending offers to in range peers because these results show that only 500 approx offers were sent to peers that were not in range.

@bhartnett
Copy link
Contributor Author

@kdeme I updated this change to include the recent spec changes so that now we only do the node lookup in neighborhoodGossip from the JSON-RPC APIs and the node lookup is now disabled by default so that Fluffy nodes won't do the lookup when gossiping received offers to peers.

@bhartnett bhartnett requested a review from kdeme May 1, 2025 03:09
@kdeme
Copy link
Contributor

kdeme commented May 6, 2025

The performance improvement is significant and is likely related to the ping forcing the radius to get cached and the increase in cache size.

Yes, I have no doubt that the increased Cache + added pings to get this cache quickly filled in will improve the amount of offers it can send out because it causes a lower amount of potential node lookups required before it sends those offers.
This can also be seen in the metrics on these lookups: 783.0/1316.0 versus 160.0/104367.0.

The actual total value that is useful is the portal_offer_accept_codes_total for accept_code 0 (=Accepted):
This is 54.0 versus 4345.0, which is significant, more than what I would guessed.

The actual "efficiency" rate (accepted per successful) is also a little bit higher, not sure if that just has to do due to more data that was not gossiped before or not (did you send the same data to start from?).

Now the part that slightly bothers me in this solution is that it will not automatically scale well when the size of the network ( = amount of nodes) will increase. This currently works great because the total size of the network in terms of nodes fits in the radius cache.
I think that once the amount of nodes grows a decent amount bigger than the cache, it will work counter productive as the gossip mechanism will continuously be overwriting the cache, sending a lot of pings and basically delaying the offers over the whole time duration (where this currently only happens at start-up).

Now, this is probably not much of an issue as this functionality is typically used for bridge nodes and for those we can increase the cache with a huge amount (perhaps even for regular nodes we can do this).

So I am fine with merging this solution as long as we remember this (= document it somewhere, probably add some comment to the code?).

@bhartnett
Copy link
Contributor Author

The actual total value that is useful is the portal_offer_accept_codes_total for accept_code 0 (=Accepted): This is 54.0 versus 4345.0, which is significant, more than what I would guessed.

The actual "efficiency" rate (accepted per successful) is also a little bit higher, not sure if that just has to do due to more data that was not gossiped before or not (did you send the same data to start from?).

Yes I sent the same data block range for both test scenarios. Perhaps the improved efficiency is because we no longer send content that is not in range.

Now the part that slightly bothers me in this solution is that it will not automatically scale well when the size of the network ( = amount of nodes) will increase. This currently works great because the total size of the network in terms of nodes fits in the radius cache. I think that once the amount of nodes grows a decent amount bigger than the cache, it will work counter productive as the gossip mechanism will continuously be overwriting the cache, sending a lot of pings and basically delaying the offers over the whole time duration (where this currently only happens at start-up).

Now, this is probably not much of an issue as this functionality is typically used for bridge nodes and for those we can increase the cache with a huge amount (perhaps even for regular nodes we can do this).

Good point. For the nodes in the network which have the node lookup disabled (nodes not running with a bridge), this won't be an issue and for this scenario the radius cache should be a bit bigger than the routing table. As the size of the network grows we should probably increase the default size of the radius cache.

For nodes that run connected to a bridge node we should allow for these nodes to be 'larger' and increase the size of the radius cache (there is already a debug parameter on the cli for this) so that this scenario which you describe doesn't occur in practice due to the larger cache.

So I am fine with merging this solution as long as we remember this (= document it somewhere, probably add some comment to the code?).

Sure, I'll add some comments to make this part clear.

@bhartnett bhartnett merged commit 326b6b9 into master May 6, 2025
9 of 10 checks passed
@bhartnett bhartnett deleted the fluffy-gossip-improvements branch May 6, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants