Skip to content

Version and Store training datasets #3

@DylanG5

Description

@DylanG5

To have reproducible model results it is important to organize and save datasets that are being used for training. To this end there are a few solutions we can employ:

  1. In this repo or in another one store a dataset in an organized manner as a zip file
    • requires zip file to fit inside git repo (2-4 gb max per file)
  2. Utilize a cloud storage solution such as Amazon S3
    • no restrictions on file size, easy to upload / download
    • not free, though very inexpensive overall
  3. ...

Success Criteria:

  • A script that provides a simple way to download or upload versions of a dataset and abstracts the storage implementation. ie script.py --version 1.0 --type walking --path /.../ would download only walking data from dataset V1.0 to a given path
  • An inexpensive and effective way to place our data

The final solution can be something super simple, as long as it's fairly scalable and free / cost-effective

Metadata

Metadata

Labels

Type

No type

Projects

Status

Ready

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions