-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
To have reproducible model results it is important to organize and save datasets that are being used for training. To this end there are a few solutions we can employ:
- In this repo or in another one store a dataset in an organized manner as a zip file
- requires zip file to fit inside git repo (2-4 gb max per file)
- Utilize a cloud storage solution such as Amazon S3
- no restrictions on file size, easy to upload / download
- not free, though very inexpensive overall
- ...
Success Criteria:
- A script that provides a simple way to download or upload versions of a dataset and abstracts the storage implementation. ie
script.py --version 1.0 --type walking --path /.../would download only walking data from dataset V1.0 to a given path - An inexpensive and effective way to place our data
The final solution can be something super simple, as long as it's fairly scalable and free / cost-effective
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Ready