version 1.0.1 (#15)

* change loading behaviour when annotation is None * feature extractor puts model back to cpu * add parameter for silent tqdm * fix nan in bbox and segmetnation * Loading bboxes fixed * Segmentation can be polygon * Added mask_crop to image loading * improve loading methods * Support for loading uncompressed RLE * Merge readme (#12) * Update README.md * Update README.md * Update README.md * Update README.md * Readme updated --------- Co-authored-by: sadda <lukas.adam.cr@gmail.com> * Wildfusion (#14) * add refactored nn classifiers * refactor pairwise matching similarity * add wildfusion * delete optim * merge from origin * fix imports * update docs * cleanup * cleanup * chore: formatting * chore: change naming * chore: formatting * chore: black formatting * chore: formatting isort * add visualisation tools * fix: examples consistency * examples: update * docs: fix imports in examples * chore: formatting * chore: update readme * chore: update readme --------- Co-authored-by: sadda <lukas.adam.cr@gmail.com>
WildlifeDatasets · Nov 12, 2024 · f5fd69e · f5fd69e
1 parent 2db0cb4
commit f5fd69e
Show file tree

Hide file tree

Showing 68 changed files with 5,023 additions and 3,177 deletions.
diff --git a/.github/workflows/code-quality.yml b/.github/workflows/code-quality.yml
@@ -25,7 +25,7 @@ jobs:
           python -m pip install --upgrade pip
           pip install black[jupyter]==22.3.0
       - name: Analysing the code with black
-        run: black --check .
+        run: black --check --diff --line-length 120 wildlife_tools
   isort:
     runs-on: ubuntu-latest
     steps:
@@ -40,19 +40,20 @@ jobs:
           python -m pip install --upgrade pip
           pip install isort==5.10.1
       - name: Analysing the code with isort
-        run: isort --check .
-  flake8:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v3
-      - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@v3
-        with:
-          python-version: ${{ env.PYTHON_VERSION }}
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install flake8==4.0.1 flake8-docstrings==1.6.0
-      - name: Analysing the code with flake8
-        run: flake8 .
+        run: isort --check --diff --line-length 120 wildlife_tools
+
+  # flake8:
+  #   runs-on: ubuntu-latest
+  #   steps:
+  #     - name: Checkout code
+  #       uses: actions/checkout@v3
+  #     - name: Set up Python ${{ env.PYTHON_VERSION }}
+  #       uses: actions/setup-python@v3
+  #       with:
+  #         python-version: ${{ env.PYTHON_VERSION }}
+  #     - name: Install dependencies
+  #       run: |
+  #         python -m pip install --upgrade pip
+  #         pip install flake8==4.0.1 flake8-docstrings==1.6.0
+  #     - name: Analysing the code with flake8
+  #       run: flake8 wildlife_tools
diff --git a/README.md b/README.md
@@ -13,15 +13,19 @@
 </p>
 
 <div align="center">
-  <p align="center">Pipeline for wildlife re-identification including dataset zoo, training tools and trained models. Usage includes classifying new images in labelled databases and clustering individuals in unlabelled databases.</p>
+  <img src="docs/resources/tools-logo.png" alt="Wildlife tools" width="300">
+  <p align="center">A toolkit for Animal Individual Identification that covers use cases such as training, feature extraction, similarity calculation, image retrieval, and classification.</p>
+
   <a href="https://wildlifedatasets.github.io/wildlife-tools/">Documentation</a>
   ·
   <a href="https://github.com/WildlifeDatasets/wildlife-tools/issues/new?assignees=aerodynamic-sauce-pan&labels=bug&projects=&template=bug_report.md&title=%5BBUG%5D">Report Bug</a>
   ·
   <a href="https://github.com/WildlifeDatasets/wildlife-tools/issues/new?assignees=aerodynamic-sauce-pan&labels=enhancement&projects=&template=enhancement.md&title=%5BEnhancement%5D">Request Feature</a>
 </div>
 
-</br>
+</br >
+
+## Our other projects
 
 | <a href="https://github.com/WildlifeDatasets/wildlife-datasets"><img src="docs/resources/datasets-logo.png" alt="Wildlife datasets" width="200"></a>  | <a href="https://huggingface.co/BVRA/MegaDescriptor-L-384"><img src="docs/resources/megadescriptor-logo.png" alt="MegaDescriptor" width="200"></a> | <a href="https://github.com/WildlifeDatasets/wildlife-tools"><img src="docs/resources/tools-logo.png" alt="Wildlife tools" width="200"></a> |
 |:--------------:|:-----------:|:------------:|
@@ -30,10 +34,21 @@
 </br>
 
 # Introduction
-The `wildlife-tools` library offers a simple interface for various tasks in the Wildlife Re-Identification domain. It covers use cases such as training, feature extraction, similarity calculation, image retrieval, and classification. It complements the `wildlife-datasets` library, which acts as dataset repository. All datasets there can be used in combination with `WildlifeDataset` component, which serves for loading extracting images and image tensors other tasks. 
+The `wildlife-tools` library offers a simple interface for various tasks in the Wildlife Re-Identification domain. It covers use cases such as training, feature extraction, similarity calculation, image retrieval, and classification. It complements the `wildlife-datasets` library, which acts as dataset repository.
 
 More information can be found in [Documentation](https://wildlifedatasets.github.io/wildlife-tools/)
 
+## What's New
+Here’s a summary of recent updates and changes.
+
+
+- **Expanded Functionality:** Local feature matching is done using [gluefactory](https://github.com/cvg/glue-factory) 
+    - Feature extraction methods: SuperPoint, ALIKED, DISK, SIFT features
+    - Matching method: LightGlue, More efficient LoFTR
+- **New Feature:** Introduced WildFusion, calibrated score fusion for high-accuracy animal reidentification. Added calibration methods.
+- **Bug Fixes:** Resolved issues with knn and ranking inference methods and many more.
+
+
 ## Installation
 
 To install `wildlife-tools`, you can build it from scratch or use pre-build Pypi package.
@@ -58,9 +73,9 @@ pip install -e .
 
 ## Modules in the in the `wildlife-tools`
 
-- The `data` module provides tools for creating instances of the `WildlifeDataset`.
-- The `train` module offers tools for fine-tuning feature extractors on the `WildlifeDataset`.
-- The `features` module provides tools for extracting features from the `WildlifeDataset` using various extractors.
+- The `data` module provides tools for creating instances of the `ImageDataset`.
+- The `train` module offers tools for fine-tuning feature extractors on the `ImageDataset`.
+- The `features` module provides tools for extracting features from the `ImageDataset` using various extractors.
 - The `similarity` module provides tools for constructing a similarity matrix from query and database features.
 - The `inference` module offers tools for creating predictions using the similarity matrix.
 
@@ -70,8 +85,8 @@ pip install -e .
 
 ```mermaid
   graph TD;
-      A[Data]-->|WildlifeDataset|B[Features]
-      A-->|WildlifeDataset|C;
+      A[Data]-->|ImageDataset|B[Features]
+      A-->|ImageDataset|C;
       C[Train]-->|finetuned extractor|B;
       B-->|query and database features|D[Similarity]
       D-->|similarity matrix|E[Inference]
@@ -80,24 +95,24 @@ pip install -e .
 
 
 ## Example
-### 1. Create `WildlifeDataset` 
-Using metadata from `wildlife-datasets`, create `WildlifeDataset` object for the MacaqueFaces dataset.
+### 1. Create `ImageDataset` 
+Using metadata from `wildlife-datasets`, create `ImageDataset` object for the MacaqueFaces dataset.
 
 ```Python
 from wildlife_datasets.datasets import MacaqueFaces
-from wildlife_tools.data import WildlifeDataset
+from wildlife_tools.data import ImageDataset
 import torchvision.transforms as T
 
 metadata = MacaqueFaces('datasets/MacaqueFaces')
 transform = T.Compose([T.Resize([224, 224]), T.ToTensor(), T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))])
-dataset = WildlifeDataset(metadata.df, metadata.root, transform=transform)
+dataset = ImageDataset(metadata.df, metadata.root, transform=transform)
 ```
 
 Optionally, split metadata into subsets. In this example, query is first 100 images and rest are in database.
 
 ```Python
-dataset_database = WildlifeDataset(metadata.df.iloc[100:,:], metadata.root, transform=transform)
-dataset_query = WildlifeDataset(metadata.df.iloc[:100,:], metadata.root, transform=transform)
+dataset_database = ImageDataset(metadata.df.iloc[100:,:], metadata.root, transform=transform)
+dataset_query = ImageDataset(metadata.df.iloc[:100,:], metadata.root, transform=transform)
 ```
 
 ### 2. Extract features
@@ -149,3 +164,13 @@ If you like our package, please cite us.
     pages     = {5953-5963}
 }
 ```
+
+```
+@article{cermak2024wildfusion,
+  title={WildFusion: Individual animal identification with calibrated similarity fusion},
+  author={Cermak, Vojt{\v{e}}ch and Picek, Lukas and Adam, Luk{\'a}{\v{s}} and Neumann, Luk{\'a}{\v{s}} and Matas, Ji{\v{r}}{\'\i}},
+  journal={arXiv preprint arXiv:2408.12934},
+  year={2024}
+}
+```
+
diff --git a/docs/data.md b/docs/data.md
diff --git a/docs/wildlife_dataset.md → docs/dataset.md b/docs/wildlife_dataset.md → docs/dataset.md
@@ -1,10 +1,16 @@
+::: data.dataset
+    options:
+      show_root_heading: true
+      heading_level: 2
+
+
 # Wildlife dataset
 
-WildlifeDataset is a class for creating pytorch style datasets by integration of datasets provided by wildlife-datasets library. It has implemented \_\_len\_\_ and \_\_getattr\_\_ methods, which allows using pytorch dataloaders for training and inference.
+WildlifeDataset is a class for creating Pytorch style image datasets and allows integration of datasets provided by `wildlife-datasets` library. It has implemented `__len__` and `__getattr__` methods, which allows using standard Pytorch dataloaders for training and inference.
 
 
 ## Metadata dataframe
-Integral part of WildlifeDataset is metadata dataframe, which includes all information about images in the dataset.
+Key part of WildlifeDataset is metadata dataframe, which includes all information about images in the dataset.
 Typical dataset from the wildlife-dataset have following metadata table:
 
 
@@ -49,9 +55,3 @@ image, label = dataset[0]
 ```
 
 
-## Reference
-::: data.dataset.WildlifeDataset
-    options:
-      show_symbol_type_heading: false
-      show_bases: false
-      show_root_toc_entry: false
diff --git a/docs/features.md b/docs/features.md
@@ -1,63 +1,46 @@
 # Feature extraction
 Feature extractors offers a standardized way to extract features from instances of the `WildlifeDataset`.
 
-Feature extractors, implemented as classes, can be created with specific arguments that define the extraction properties. After instantiation, the extractor functions as a callable, requiring only a single argument—the `WildlifeDataset` instance. The specific output type and shape vary based on the chosen feature extractor. In general, the output is iterable, with the first dimension corresponding to the size of the `WildlifeDataset` input.
+Feature extractors, implemented as classes, can be created with specific arguments that define the extraction properties. After instantiation, the extractor functions as a callable, requiring only a single argument—the `WildlifeDataset` instance. The specific output type and shape vary based on the chosen feature extractor. Output is `FeatureDataset` instance.
 
-## Deep features
 
+::: features.deep
+    options:
+      show_root_heading: true
+      heading_level: 2
 
-The `DeepFeatures` extractor operates by extracting features through the forward pass of a PyTorch model. The output is a 2D array, where the rows represent images, and the columns correspond to the embedding dimensions. The size of the columns is determined by the output size of the model performing the feature extraction.
-
-### Example
-The term `dataset` refers to any instance of WildlifeDataset with transforms that convert it into a tensor with the appropriate shape.
 
-```Python
-import timm
-from wildlife_tools.features import DeepFeatures
+::: features.local
+    options:
+      show_root_heading: true
+      heading_level: 2
 
-backbone = timm.create_model('hf-hub:BVRA/MegaDescriptor-T-224', num_classes=0, pretrained=True)
-extractor = DeepFeatures(backbone, device='cuda')
-features = extractor(dataset)
-```
 
-### Reference
-::: features.deep.DeepFeatures
+::: features.memory
     options:
-      show_symbol_type_heading: false
-      show_bases: false
-      show_root_toc_entry: false
-
+      show_root_heading: true
+      heading_level: 2
 
 
-## SIFT features
-The `SIFTFeatures` extractor retrieves a set of SIFT descriptors for each provided image. The output is a list with a length of `n_inputs`, containing arrays. These arrays are 2D with a shape of `n_descriptors` x `128`, where the value of `n_descriptors` depends on the number of SIFT descriptors extracted for the specific image. If one or less descriptors are extracted, the value is None.  The SIFT implementation from OpenCV is used.
+## Examples
 
-### Example
-The term `dataset` refers to any instance of WildlifeDataset with transforms that convert it into grayscale PIL image.
+### Example - SuperPoint features
 
 ```Python
-from wildlife_tools.features import SIFTFeatures
+from wildlife_tools.features.local import SuperPointExtractor
 
-extractor = SIFTFeatures()
+extractor = SuperPointExtractor(backend='opencv', detection_threshold=0.0, force_num_keypoints=True, max_num_keypoints=256)
 features = extractor(dataset)
 ```
 
 
-### Reference
-::: features.sift.SIFTFeatures
-    options:
-      show_symbol_type_heading: false
-      show_bases: false
-      show_root_toc_entry: false
-
-
+### Example - Deep features
 
-## Data to memory
-
-The `DataToMemory` extractor loads the `WildlifeDataset` into memory. This is particularly usefull for the `LoftrMatcher`, which operates directly with image tensors. While it is feasible to directly use the `WildlifeDataset` and load images from storage dynamically, the `LoftrMatcher` lacks a loading buffer. Consequently, loading images on the fly could become a significant bottleneck, especially when matching all query-database pairs, involving `n_query` x `n_database` image loads.
+```Python
+import timm
+from wildlife_tools.features.deep import DeepFeatures
 
-::: features.memory.DataToMemory
-    options:
-      show_symbol_type_heading: false
-      show_bases: false
-      show_root_toc_entry: false
+backbone = timm.create_model('hf-hub:BVRA/MegaDescriptor-T-224', num_classes=0, pretrained=True)
+extractor = DeepFeatures(backbone, device='cuda')
+features = extractor(dataset)
+```