-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load-balanced bootstrapping #574
Comments
I'd use a different protocol string for this protocol instead of adding this to the DHT protocol. Otherwise, we'd have to have some form of "don't add the bootstrapper to your routing table" exception. If we do that and different users pick different bootstrappers, we're going to have some pretty weird behavior. Other concerns:
|
K random peers that support the DHT protocol, right ? So nodes can then run queries against them to fill their Routing Tables. |
Please can you explain this ? Shouldn't even the DHT peers remove the bootstrappers from their RT after they get a reply from the bootstrapping node as mentioned in step 2 ? |
You're probably right. If the DHT knows which peers are bootstrappers, it's easy. If it doesn't, we'll have to have some heuristic where we take the first several nodes we see and use the bootstrap protocol. |
Hey @Stebalien @petar Assigning this too myself to get the discussions rolling on this one and come up with a writeup/proposal. |
Shouldn't bootstrapping be considered a one-time-operation after installation? I mean, since we remove the "behind nat" peers from the DHT we should end up with much more stable peer tables. So maybe we should try to recover from a 'downtime' with say, the best 20% of the known peers, and if this fails return to the known bootstrap nodes. This, of course, makes only sense, if the node has run a minimum amount of time, like 1 or 2 hours (after installation). To make sure that we don't run into net-splits, we might want to query from time to time one of the bootstrapping nodes, to get some "fresh" & known good peers.
This sounds like a good solution. For the bootstrapping requests, we should consider saving the peer which provided those peers as a set and rate those sets depending on how they work for us for queries in the long run. (like proposed for the routing responses in #589 ) This way we can greylist peers that supply bad peers to us - so we don't use them as bootstrapping nodes. |
I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.
I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes. Edit: On the other hand, a simple Web of Trust could help to avoid the bootstrap situation entirely - a new user might have a friend who he trusts. So he can add him to a list of trusted nodes, and bootstrap over him. The more trusted nodes a user has, the more independent he is from the bootstrapping nodes. |
I think you are misunderstanding the issue. Currently when we ask bootstrappers for peers we do so by issuing standard queries towards areas of our routing table (e.g. help me find node A (that happens to have 0 bits in common with me in Kademlia space), help me find B (which has 1 bit in common), etc.). The bootstrapper is returning us peers from its routing table. If many users are hitting the bootstrap nodes that inherently skews the load onto not just the bootstrappers, but the peers of the bootstrappers. Additionally, this can cause routing tables to not have some of the randomness properties that the DHT's health relies on. As a result, the proposal is that instead of issuing bootstrappers 10+ queries (do you know A, B, C, D,... who have 0...9 bits in common with me) we issue the bootstrapper one query "please give me a random subset of peers in the network, they do not even need to be in your routing table". Note: this has nothing to do with how much you trust the bootstrapper, it's just about the type of question you are asking the bootstrapper which is amplified by the fact that the network tends to work with a small number of bootstrappers.
I don't understand the argument here. Is this about whether an adversary can use traffic analysis to detect if we're a libp2p node? I don't think a relay is going to protect you here since the adversary could also notice traffic spikes from you to the relay. While you could try and bundle more and more of your requests through the relay for traffic obfuscation that just adds cost (bandwidth + machine resources for the relay, plus latency for you), but still ends up being susceptible to traffic analysis. |
@RubenKelevra The idea of Web of Trust is already essentially implemented. You can already add a multiaddrs of a peer you trust and bootstrap from it. :) |
Well yeah, we currently rely on a list of relays for this function. But when we return to a "every node in the network can offer relay services" we could just use a random peer we are connected to, to relay our requests to the bootstrap nodes. Is my assumption correct, that if we bootstrap just once and use the best performing peers from our last session we end up risking netsplits, since there's no common element we connect to?
Ah! thanks, this makes sense!
Well, the idea is to differentiate between the common bootstrap nodes and nodes you trust. The bootstrap nodes would be used, as a last resort, while the nodes you trust would be used on every startup to fetch some "fresh" good performing peers from them. If we have trusted nodes, we can avoid that we have to analyze the returned peers and how they perform for us and always assume they are good. It would also be an excellent way to bootstrap fast for small nodes that have a short runtime duration, like phones/javascript nodes. Trusted nodes could allow each other to fulfill early dht requests, while the connections to other peers are established, reducing the time to first byte. |
@Stebalien What's the priority of this in the larger scheme of things ? |
Well, the priority is "we'll throw it in when we make the other DHT protocol changes". |
Bootstrap proposal from @aschmahmann:
|
I wrote a proposal, which doesn't really conflict with @aschmahmann's proposal. But would allow to update bootstrap peers and distribute the bootstrapping (and optionally much more in the future). |
Additional thought. We may be able to save some work here by not immediately switching to the rendezvous protocol and having the bootstrappers:
Clients will:
Overall Upside: Likely an easier implementation path short-term |
The current bootstrap protocol:
To remedy this, a new bootstrap protocol should go as follows:
To load-balance (and increase speed), bootstrappers can pre-lookup random peers in the network in their free time (on a separate thread), so that they can respond immediately to step 2 (above).
Furthermore, if bootstrappers accumulate a set of N >> K random peers in the network, they can respond to bootstrap queries (step 2, above) by picking a random subset of size K from the accumulated N.
The text was updated successfully, but these errors were encountered: