Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Filecoin Retrieval TTFB #102

Open
hannahhoward opened this issue Jul 19, 2022 · 1 comment
Open

Optimizing Filecoin Retrieval TTFB #102

hannahhoward opened this issue Jul 19, 2022 · 1 comment
Assignees

Comments

@hannahhoward
Copy link
Collaborator

Currently, we use the following steps for retrieving data from Filecoin when we lack a CID in the local cache:

  1. Query the indexer/Estuary
  2. For every result returned, query each individual provider in parallel, but wait for all results to return.
  3. Retrieve sequentially based on a sorting function.

There are a couple ways we can optimize this:

  • the Filecoin Indexer at minimum should contain information on whether the deal is verified in the results returned. We can use a deal being verified as a proxy for "likely free" and skip the query. It also contains the PieceCID which we could use to optimize on the provider side not having to go look for it.
  • if we start to get query responses back that meet the sort of best criteria in our sorting function (say anything that's free for example) we could just kick off our first retrieval and sort the remaining responses as they come in.

One other thing to factor in is how we want to abstract the additional data returned by the indexer that doesn't come from estuary (I think). Honestly we should think about this problem in general since for example Estuary can have a different "Root CID" while the index is always the same.

@elijaharita
Copy link
Contributor

we could add a field to RetrievalCandidate PossiblyFree bool or something along those lines. since RetrievalCandidate is returned by the indexer impl, it would be no issue to write endpoint-specific behavior. if estuary isn't able to provide the info, it would be as simple as just having the estuary endpoint always set PossiblyFree to false. the indexer endpoint impl would be able to set it properly.

we could immediately just attempt retrievals on all of the PossiblyFree == true candidates with pre-assumed retrieval params, and only if all of those fail, fall back to query + retrieval like what's currently done.

@hannahhoward hannahhoward moved this from Backlog to In Progress in Bedrock Tornado Team OLD Aug 5, 2022
@hannahhoward hannahhoward moved this from In Progress to Backlog in Bedrock Tornado Team OLD Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants