Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small pipeline fixes #72

Merged
merged 6 commits into from
Dec 7, 2023
Merged

Small pipeline fixes #72

merged 6 commits into from
Dec 7, 2023

Conversation

yellowcap
Copy link
Member

A bunch of small commits

Closes #69 , #70, #71

This is easy to forget and then run with subset without the intention to actually subset
Closes #69

- Zero padding for counter
- v before version number
- Underscores instead of hyphon separators
- Drop hyphons from date stamp
@yellowcap yellowcap force-pushed the small-pipeline-fixes branch from 800a8aa to b395393 Compare December 7, 2023 11:25
@yellowcap
Copy link
Member Author

Merging these simple changes to prepare for re-run of pipeline.

@yellowcap yellowcap merged commit 58a1616 into main Dec 7, 2023
1 check passed
@yellowcap yellowcap deleted the small-pipeline-fixes branch December 7, 2023 11:29
weiji14 added a commit that referenced this pull request Dec 11, 2023
Obtaining the YYYY-MM-DD date from the GeoTIFF's tag metadata, instead of parsing it from the filename, thanks to the change at 426aa06/#72.
weiji14 added a commit that referenced this pull request Dec 11, 2023
* 🔧 Increase image_size from 256 to 512, patch_size from 32 to 64

Increase the chip image size from 256 to 512 pixels, and the patch size from 32 to 64 pixels. Updated the unit test and an assert statement, and fixed a typo.

* 👽 Get YYYY-MM-DD from GeoTIFF tag instead of filename

Obtaining the YYYY-MM-DD date from the GeoTIFF's tag metadata, instead of parsing it from the filename, thanks to the change at 426aa06/#72.

* ✨ Allow GeoTIFFDataModule to get GeoTIFF data from an s3 bucket

New feature to allow passing in a URL to an s3 bucket, and loading the GeoTIFF data from there directly. Added a unit test that checks that this works to list a GeoTIFF file from s3://copernicus-dem-30m/. Also improved the docstring and type hint of the setup() function's 'stage' parameter.

* 🐛 Add sharding filter before loading GeoTIFF data to torch.Tensor

Need to do this so that the data loading is distributed to the workers, otherwise each worker is doing duplicated work. Also set num_workers to 1 in test_geotiffdatapipemodule to get a consistent result.

* 🙈 Gitignore checkpoints in nested folders

Ensure that *.ckpt files in sub-folders are ignored too.

* ⚡ Set float32 matmul precision to medium

Prevents messages like `You are using a CUDA device ('NVIDIA A10G') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance.`

* 📝 Mention in main README.md that data_path can be an s3 bucket

Just casually documenting in the main README.md on how one can directly generate embeddings from GeoTIFF files stored in an s3 bucket instead of locally.
brunosan pushed a commit that referenced this pull request Dec 27, 2023
* Remove default for subset

This is easy to forget and then run with subset without the intention to actually subset

* Improve file name

Closes #69

- Zero padding for counter
- v before version number
- Underscores instead of hyphon separators
- Drop hyphons from date stamp

* Bump version to 02

* Make mgrs sample file external

Closes #71

* Add date to raster metadata

Closes #70

* Improve print statement
brunosan pushed a commit that referenced this pull request Dec 27, 2023
* 🔧 Increase image_size from 256 to 512, patch_size from 32 to 64

Increase the chip image size from 256 to 512 pixels, and the patch size from 32 to 64 pixels. Updated the unit test and an assert statement, and fixed a typo.

* 👽 Get YYYY-MM-DD from GeoTIFF tag instead of filename

Obtaining the YYYY-MM-DD date from the GeoTIFF's tag metadata, instead of parsing it from the filename, thanks to the change at 426aa06/#72.

* ✨ Allow GeoTIFFDataModule to get GeoTIFF data from an s3 bucket

New feature to allow passing in a URL to an s3 bucket, and loading the GeoTIFF data from there directly. Added a unit test that checks that this works to list a GeoTIFF file from s3://copernicus-dem-30m/. Also improved the docstring and type hint of the setup() function's 'stage' parameter.

* 🐛 Add sharding filter before loading GeoTIFF data to torch.Tensor

Need to do this so that the data loading is distributed to the workers, otherwise each worker is doing duplicated work. Also set num_workers to 1 in test_geotiffdatapipemodule to get a consistent result.

* 🙈 Gitignore checkpoints in nested folders

Ensure that *.ckpt files in sub-folders are ignored too.

* ⚡ Set float32 matmul precision to medium

Prevents messages like `You are using a CUDA device ('NVIDIA A10G') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance.`

* 📝 Mention in main README.md that data_path can be an s3 bucket

Just casually documenting in the main README.md on how one can directly generate embeddings from GeoTIFF files stored in an s3 bucket instead of locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero padding of chip number in data pipeline
1 participant