Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http-routing: add "get closest peers" operation #476

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions src/routing/http-routing-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,60 @@ Each object in the `Providers` list is a record conforming to a schema, usually

## Peer Routing API

### `GET /routing/v1/closest-peers/{peer-id}?[closerThan]&[count]`
Copy link
Member

@lidel lidel Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain I think we could prototype this in https://github.com/ipfs-shipyard/someguy and deploy to https://delegated-ipfs.dev.

Priority-wise, and for the purpose of IPIP we will write for this at some point, does the utility here stops at finding Circuit Relay servers viable in browsers without being a DHT client?

Would it be useful to have API where the entire hash space could be queried for closest value, not just PeerIDs?
I wonder if this could be generalized into delegated /routing/v1/closest/{cid}?[than]&[count] where behind the scenes we prefix and turn peerids and cids to amindht hashes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the utility here stops at finding Circuit Relay servers viable in browsers without being a DHT client?

No, there are other uses - you may wish to use this to populate your routing tables, or to find peers that can host provider records. It would be useful to perform these operations from environments that have limited resources or cannot open many concurrent connections, for example.

Would it be useful to have API where the entire hash space could be queried for closest value, not just PeerIDs?

Yes, plus here the PeerIDs are represented as CIDs, so the above is something that can be performed.

To make this clearer it might be better specced as {cid} rather than {peer-id} I guess.

Copy link
Contributor

@aschmahmann aschmahmann Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use-case for this is browser nodes performing random walks to find peers that they can make a circuit relay reservation on, without having to be DHT clients

does the utility here stops at finding Circuit Relay servers

Note: In many cases you'd want relay servers you'd also want peer routing (i.e. finding your address based on your peerID) which in the case of Amino means having recently pinged the closest peers to your peerID in XOR space. Ideally one of those (20) nodes is also acting as your relay which means the number of connections and queries you'd need are even smaller. So we don't really want a random walk we want to talk to specific peers when possible and only broaden the search if we have to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this makes sense at the /routing/v1 namespace since it's fairly specific to systems like the Amino DHT.

Having an HTTP API for this kind of thing makes sense though, perhaps as /amino or /kad or something. It could be nested under /routing/v1/<amino, etc.> as well if that makes folks happier.

  • Note: Given that this is quite close to being the HTTP equivalent of the FIND_NODE RPC something like this could potentially end up being worked into that protocol itself (e.g. to enable http-based request/response).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope was that we could have it be routing-agnostic but maybe that's not possible, since the actual results would very much depend on what was being queried and how.

I'm happy with:

GET /routing/v1/amino/closest-peers/{peer-id}?[closerThan]&[count]

though if it's all amino, perhaps we should just have done with it and use the RPC names?

  • GET /routing/v1/amino/find-node/{multibase}

...then maybe later:

  • GET /routing/v1/amino/get-value/{multibase}
  • GET /routing/v1/amino/get-providers/{cid}

though the second is redundant given the existing HTTP endpoints and the first is only partially redundant since we could use it to search for more than just IPNS records.

...and even later PUT_VALUE and ADD_PROVIDER?


#### Path Parameters

- `peer-id` is a [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md) represented as a CIDv1 encoded with `libp2p-key` codec.

#### Query Paramters

- `closerThan` is an optional [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md) represented as a CIDv1 encoded with `libp2p-key` codec.
- Returned peer records must be closer to `peer-id` than `closerThan`.
- If omitted the routing implementation should use it's own [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md).
achingbrain marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the query is expected to return the absolute closest peers to the target CID, the only use case I can see for closerThan is to find peers that are under eclipse attacks, but there are no current plans to implement it. Otherwise it will just limit the number of returned peers.

If there are other reasons to keep this parameter, I would recommend against using the receiver's (routing implementation) own peer id, when no value is provided. The default value should be no filter at all, return all the count closest peers. Otherwise, if a peer is looking for the 20 closest peers to some CID, and it reaches the 5th closest peer, this one will respond only with a list of 4 peers (because all the other are further away compare to the 5th closest peer). So in order to get the 20 closest peers, the requester would have to come up with a closerThan parameter that is far enough so that at least 20 peers are closer to the target CID.

- `count` is an optional number that specifies how many peer records the requester desires.
- Minimum 1, maximum 100, default 20.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the closerThan parameter is set, and count is omitted, the default value should be the maximum, otherwise some peers closer than closerThan may be filtered out. Both parameters could however be combined if both are set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my assumption here is that the peers would be sorted by closeness value before the limit was applied.

Would this not be the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my assumption here is that the peers would be sorted by closeness value before the limit was applied.

Yes that is correct. My point is that both closerThan and count can limit the count of peers that are returned, the parameters may conflict with each other.

E.g peer-id=key0&closerThan=key1 should return all the peers between key0 and key1, but if there are more than default=20 peers between key0 and key1 this query will only return the default closest, and not all of them up to the maximum.

If none of count or closerThan parameters is set, the default count should be default=20. However, if closerThan is set and not count, the default count should be set to the maximum=100. This allows to return as many peers as possible between key0 and key1, and if the caller wants to set a limit, they should combine closerThan and count.

Also, when setting count and not closerThan using the node's peer id as default closerThan parameter can be problematic. E.g I want to provide a CID to the 20 closest nodes, and I am the 12th closest peer to this CID. I will receive only 11 closer nodes, and will be unable to find the 20 closest nodes. Hence, if count is set and not closerThan the closerThan default should be the opposite of the target peer-id (peer-id XOR 111...11). Maybe the opposite of the target peer-id is a better default for closerThan than the requester's peer id, allowing the default response to contain a constant number of peers.


#### Response Status Codes

- `200` (OK): the response body contains peer records.
- `404` (Not Found): must be returned if no matching records are found.
- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints.

#### Response Headers

- `Content-Type`: the content type of this response, which MUST be `application/json` or `application/x-ndjson` (see [streaming](#streaming)).
- `Last-Modified`: an HTTP-date timestamp ([RFC9110, Section 5.6.7](https://www.rfc-editor.org/rfc/rfc9110#section-5.6.7)) of the resolution, allowing HTTP proxies and CDNs to support inexpensive update checks via `If-Modified-Since`
- `Cache-Control: public, max-age={ttl}, public, stale-while-revalidate={max-ttl}, stale-if-error={max-ttl}`: meaningful cache TTL returned with the response.
- When present, `ttl` SHOULD be shorter for responses whose resolution ended in no results (e.g. 15 seconds),
and longer for responses that have results (e.g. 5 minutes).
- Implementations SHOULD include `max-ttl`, set to the maximum cache window of the underlying routing system.
For example, if Amino DHT results are returned, `stale-while-revalidate` SHOULD be set to `172800` (48h, which at the time of writing this specification, is the provider record expiration window).
achingbrain marked this conversation as resolved.
Show resolved Hide resolved
- `Vary: Accept`: allows intermediate caches to play nicely with the different possible content types.

#### Response Body

```json
{
"Peers": [
{
"Schema": "<schema>",
"Protocols": ["<protocol-a>", "<protocol-b>", ...],
"ID": "bafz...",
"Addrs": ["/ip4/..."],
...
},
...
]
}
```

The number of peer records in the responses SHOULD be limited to the `count` query parameter, which defaults to 20 if unspecified.

The client SHOULD be able to make a request with `Accept: application/x-ndjson` and get a [stream](#streaming) with more results.

Each object in the `Peers` list is a record conforming to the [Peer Schema](#peer-schema).

### `GET /routing/v1/peers/{peer-id}`

#### Path Parameters
Expand Down
Loading