Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning about object storage #58

Open
5 tasks
consideRatio opened this issue Jun 10, 2021 · 2 comments
Open
5 tasks

Learning about object storage #58

consideRatio opened this issue Jun 10, 2021 · 2 comments
Labels
🏷️ JupyterHub Something related to JupyterHub

Comments

@consideRatio
Copy link
Member

consideRatio commented Jun 10, 2021

Background

We learned that the NFS storage using the Amazon EFS service is very costly for just a few TB of storage, and that we want to make use of S3 Object Storage instead that is ~10-20 times cheaper.

Object storage learning goals

There may be plenty of things to learn and document about these, but for now I've just created a single issue listing some learning goals about working with object storage in general and object storage on AWS.

  • Understand how to access our s3 object storage bucket that we reference as the "scratch bucket"
    You can use the aws s3 command, for example aws s3 cp <source> target> and you will copy something from one location to another, where a location can be a path on the local file system or it can be a location in object storage such as s3://jmte-scratch/consideratio which is what my SCRATCH_BUCKET environment variable evaluates to, while yours will be s3://jmte-scratch/<your-username>.
  • Understand sensible practices for bucket to bucket transfers
  • Understand possibilities of mounting buckets to the file system
  • Understand costs of S3 object storage
    Some technical details: https://aws.amazon.com/s3/pricing/
    What kind of s3 bucket storage are we currently allocating in our s3 scratch bucket?
  • Understand how we can use ephemeral storage in /tmp
    I think we can't download more than ~80 GB per node for now, but that we can increase this by updating our machine configuration.
@fperez
Copy link
Collaborator

fperez commented Jun 10, 2021

This recent article titled Cloud-Native Repositories for Big Scientific Data may be a useful reference...

@whyjz whyjz added the 🏷️ JupyterHub Something related to JupyterHub label Jun 11, 2021
@consideRatio
Copy link
Member Author

We've received amazing feedback from the Pangeo Cloud ops working group as can be seen in the notes from the meeting 14th June.

https://docs.google.com/document/d/1I-2VNNHoAjjeYvlCezQhFLmiu2OevqGDS5nUAP-6Hfw/edit#

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ JupyterHub Something related to JupyterHub
Projects
None yet
Development

No branches or pull requests

3 participants