Version and Store training datasets

To have reproducible model results it is important to organize and save datasets that are being used for training. To this end there are a few solutions we can employ:

1. In this repo or in another one store a dataset in an organized manner as a zip file
     - requires zip file to fit inside git repo (2-4 gb max per file)
2. Utilize a cloud storage solution such as Amazon S3
     - no restrictions on file size, easy to upload / download
     - not free, though very inexpensive overall
3. ...

Success Criteria:
- A script that provides a simple way to download or upload versions of a dataset and abstracts the storage implementation. ie `script.py --version 1.0 --type walking --path /.../` would download only walking data from dataset V1.0 to a given path
- An inexpensive and effective way to place our data

The final solution can be something super simple, as long as it's fairly scalable and free / cost-effective

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version and Store training datasets #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version and Store training datasets #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions