Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State @ Aug 25 #1

Open
4 tasks
alanshaw opened this issue Aug 25, 2023 · 6 comments
Open
4 tasks

State @ Aug 25 #1

alanshaw opened this issue Aug 25, 2023 · 6 comments

Comments

@alanshaw
Copy link
Member

alanshaw commented Aug 25, 2023

Background

  • Deployed as sha256it with SST in AWS
  • Using the staging environment for hashing/copying from dotstorage-prod-0
  • Using the production environment for hashing/copying from dotstorage-prod-1
  • See copyFunctionURL and hashFunctionURL in the stage outputs for current lambda URLs

How to

  • Get keys
    • Use ls CLI command (in packages/cli)
  • Generate hashes
    • Use hash CLI command (in packages/cli)
    • e.g. cat keys-dotstorage-prod-1-d.ndjson | sha256it hash > hashes-dotstorage-prod-1-d.ndjson
  • Copy CARs
    • Use copy CLI command (in packages/cli)
    • e.g. cat hashes-dotstorage-prod-1-a.ndjson | sha256it copy > copies-dotstorage-prod-1-a.ndjson

Note: all CLI commands output ndjson.

Resources

Complete

  1. Key listings for dotstorage-prod-0 AND dotstorage-prod-1
    • Note: Keys for dotstorage-prod-1 are separated in 5 million line chunks - a-g
  2. Hashes for dotstorage-prod-0
  3. Hashes a, b & c for dotstorage-prod-1
  4. Copies for dotstorage-prod-0
  5. Copies a for dotstorage-prod-1

TODO

Preview Give feedback
@alanshaw
Copy link
Member Author

@olizilla if you can pick some of this up while I'm OOO that would be amazing! If not, then no worries I will finish on my return.

@olizilla
Copy link
Contributor

@alanshaw the resource links in sha256it progress report above for Keys & Hashes link to bafytodo! Do you have real CIDs for those?

@alanshaw
Copy link
Member Author

😩 I didn't get round to uploading them before I left.

Step 4/5 are still doable and would be useful progress if nothing else...

@olizilla
Copy link
Contributor

no worries. You mean todos 3 & 4 right?

  1. Script to verify data was copied. A HEAD request would suffice. I am confident that if it is present and non-zero in size, then it is there in its entirety and it is consistency verified due to the copy operation using ChecksumSHA256.
  2. Script to update DynamoDB carpaths with CAR CID. We should be able to walk the DB, find a carpath with complete/, lookup the CAR CID in the output files above and update it.

@alanshaw
Copy link
Member Author

Yes, FML 🤦‍♂️

@olizilla
Copy link
Contributor

olizilla commented Sep 1, 2023

Script to update DynamoDB carpaths with CAR CID. We should be able to walk the DB, find a carpath with complete/, lookup the CAR CID in the output files above and update it.

I think faster to use the input files as the source rather than a full db scan... there are ~43 billion rows in the target dynamo table. If we use the input files we can locally reduce the job down to just the set of inserts that need to happen and divide the work up between us... something like

  • filter input files where carpath starts with /complete
  • cli to read files and write updates to a queue in batches (maybe 25 which is the max dynamo batch write size)
  • consumer to read from queue and update db (1 message of 25 = 1 batch write op)

we can share the files around and each run the cli over a subset of the inputs to fill the queue, and we can tweak the queue subscriber concurrency to make the writes faster.

or we could have the cli just write to dynamo. I wonder if there is much speed difference between writing to SQS vs writing to dynamo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants