Skip to content

Commit

Permalink
version 1.0.1 (#15)
Browse files Browse the repository at this point in the history
* change loading behaviour when annotation is None

* feature extractor puts model back to cpu

* add parameter for silent tqdm

* fix nan in bbox and segmetnation

* Loading bboxes fixed

* Segmentation can be polygon

* Added mask_crop to image loading

* improve loading methods

* Support for loading uncompressed RLE

* Merge readme (#12)

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Readme updated

---------

Co-authored-by: sadda <lukas.adam.cr@gmail.com>

* Wildfusion (#14)

* add refactored nn classifiers

* refactor pairwise matching similarity

* add wildfusion

* delete optim

* merge from origin

* fix imports

* update docs

* cleanup

* cleanup

* chore: formatting

* chore: change naming

* chore: formatting

* chore: black formatting

* chore: formatting isort

* add visualisation tools

* fix: examples consistency

* examples: update

* docs: fix imports in examples

* chore: formatting

* chore: update readme

* chore: update readme

---------

Co-authored-by: sadda <lukas.adam.cr@gmail.com>
  • Loading branch information
VojtechCermak and sadda authored Nov 12, 2024
1 parent 2db0cb4 commit f5fd69e
Show file tree
Hide file tree
Showing 68 changed files with 5,023 additions and 3,177 deletions.
35 changes: 18 additions & 17 deletions .github/workflows/code-quality.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
python -m pip install --upgrade pip
pip install black[jupyter]==22.3.0
- name: Analysing the code with black
run: black --check .
run: black --check --diff --line-length 120 wildlife_tools
isort:
runs-on: ubuntu-latest
steps:
Expand All @@ -40,19 +40,20 @@ jobs:
python -m pip install --upgrade pip
pip install isort==5.10.1
- name: Analysing the code with isort
run: isort --check .
flake8:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v3
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8==4.0.1 flake8-docstrings==1.6.0
- name: Analysing the code with flake8
run: flake8 .
run: isort --check --diff --line-length 120 wildlife_tools

# flake8:
# runs-on: ubuntu-latest
# steps:
# - name: Checkout code
# uses: actions/checkout@v3
# - name: Set up Python ${{ env.PYTHON_VERSION }}
# uses: actions/setup-python@v3
# with:
# python-version: ${{ env.PYTHON_VERSION }}
# - name: Install dependencies
# run: |
# python -m pip install --upgrade pip
# pip install flake8==4.0.1 flake8-docstrings==1.6.0
# - name: Analysing the code with flake8
# run: flake8 wildlife_tools
53 changes: 39 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,19 @@
</p>

<div align="center">
<p align="center">Pipeline for wildlife re-identification including dataset zoo, training tools and trained models. Usage includes classifying new images in labelled databases and clustering individuals in unlabelled databases.</p>
<img src="docs/resources/tools-logo.png" alt="Wildlife tools" width="300">
<p align="center">A toolkit for Animal Individual Identification that covers use cases such as training, feature extraction, similarity calculation, image retrieval, and classification.</p>

<a href="https://wildlifedatasets.github.io/wildlife-tools/">Documentation</a>
·
<a href="https://github.com/WildlifeDatasets/wildlife-tools/issues/new?assignees=aerodynamic-sauce-pan&labels=bug&projects=&template=bug_report.md&title=%5BBUG%5D">Report Bug</a>
·
<a href="https://github.com/WildlifeDatasets/wildlife-tools/issues/new?assignees=aerodynamic-sauce-pan&labels=enhancement&projects=&template=enhancement.md&title=%5BEnhancement%5D">Request Feature</a>
</div>

</br>
</br >

## Our other projects

| <a href="https://github.com/WildlifeDatasets/wildlife-datasets"><img src="docs/resources/datasets-logo.png" alt="Wildlife datasets" width="200"></a> | <a href="https://huggingface.co/BVRA/MegaDescriptor-L-384"><img src="docs/resources/megadescriptor-logo.png" alt="MegaDescriptor" width="200"></a> | <a href="https://github.com/WildlifeDatasets/wildlife-tools"><img src="docs/resources/tools-logo.png" alt="Wildlife tools" width="200"></a> |
|:--------------:|:-----------:|:------------:|
Expand All @@ -30,10 +34,21 @@
</br>

# Introduction
The `wildlife-tools` library offers a simple interface for various tasks in the Wildlife Re-Identification domain. It covers use cases such as training, feature extraction, similarity calculation, image retrieval, and classification. It complements the `wildlife-datasets` library, which acts as dataset repository. All datasets there can be used in combination with `WildlifeDataset` component, which serves for loading extracting images and image tensors other tasks.
The `wildlife-tools` library offers a simple interface for various tasks in the Wildlife Re-Identification domain. It covers use cases such as training, feature extraction, similarity calculation, image retrieval, and classification. It complements the `wildlife-datasets` library, which acts as dataset repository.

More information can be found in [Documentation](https://wildlifedatasets.github.io/wildlife-tools/)

## What's New
Here’s a summary of recent updates and changes.


- **Expanded Functionality:** Local feature matching is done using [gluefactory](https://github.com/cvg/glue-factory)
- Feature extraction methods: SuperPoint, ALIKED, DISK, SIFT features
- Matching method: LightGlue, More efficient LoFTR
- **New Feature:** Introduced WildFusion, calibrated score fusion for high-accuracy animal reidentification. Added calibration methods.
- **Bug Fixes:** Resolved issues with knn and ranking inference methods and many more.


## Installation

To install `wildlife-tools`, you can build it from scratch or use pre-build Pypi package.
Expand All @@ -58,9 +73,9 @@ pip install -e .

## Modules in the in the `wildlife-tools`

- The `data` module provides tools for creating instances of the `WildlifeDataset`.
- The `train` module offers tools for fine-tuning feature extractors on the `WildlifeDataset`.
- The `features` module provides tools for extracting features from the `WildlifeDataset` using various extractors.
- The `data` module provides tools for creating instances of the `ImageDataset`.
- The `train` module offers tools for fine-tuning feature extractors on the `ImageDataset`.
- The `features` module provides tools for extracting features from the `ImageDataset` using various extractors.
- The `similarity` module provides tools for constructing a similarity matrix from query and database features.
- The `inference` module offers tools for creating predictions using the similarity matrix.

Expand All @@ -70,8 +85,8 @@ pip install -e .

```mermaid
graph TD;
A[Data]-->|WildlifeDataset|B[Features]
A-->|WildlifeDataset|C;
A[Data]-->|ImageDataset|B[Features]
A-->|ImageDataset|C;
C[Train]-->|finetuned extractor|B;
B-->|query and database features|D[Similarity]
D-->|similarity matrix|E[Inference]
Expand All @@ -80,24 +95,24 @@ pip install -e .


## Example
### 1. Create `WildlifeDataset`
Using metadata from `wildlife-datasets`, create `WildlifeDataset` object for the MacaqueFaces dataset.
### 1. Create `ImageDataset`
Using metadata from `wildlife-datasets`, create `ImageDataset` object for the MacaqueFaces dataset.

```Python
from wildlife_datasets.datasets import MacaqueFaces
from wildlife_tools.data import WildlifeDataset
from wildlife_tools.data import ImageDataset
import torchvision.transforms as T

metadata = MacaqueFaces('datasets/MacaqueFaces')
transform = T.Compose([T.Resize([224, 224]), T.ToTensor(), T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))])
dataset = WildlifeDataset(metadata.df, metadata.root, transform=transform)
dataset = ImageDataset(metadata.df, metadata.root, transform=transform)
```

Optionally, split metadata into subsets. In this example, query is first 100 images and rest are in database.

```Python
dataset_database = WildlifeDataset(metadata.df.iloc[100:,:], metadata.root, transform=transform)
dataset_query = WildlifeDataset(metadata.df.iloc[:100,:], metadata.root, transform=transform)
dataset_database = ImageDataset(metadata.df.iloc[100:,:], metadata.root, transform=transform)
dataset_query = ImageDataset(metadata.df.iloc[:100,:], metadata.root, transform=transform)
```

### 2. Extract features
Expand Down Expand Up @@ -149,3 +164,13 @@ If you like our package, please cite us.
pages = {5953-5963}
}
```

```
@article{cermak2024wildfusion,
title={WildFusion: Individual animal identification with calibrated similarity fusion},
author={Cermak, Vojt{\v{e}}ch and Picek, Lukas and Adam, Luk{\'a}{\v{s}} and Neumann, Luk{\'a}{\v{s}} and Matas, Ji{\v{r}}{\'\i}},
journal={arXiv preprint arXiv:2408.12934},
year={2024}
}
```

26 changes: 0 additions & 26 deletions docs/data.md

This file was deleted.

16 changes: 8 additions & 8 deletions docs/wildlife_dataset.md → docs/dataset.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
::: data.dataset
options:
show_root_heading: true
heading_level: 2


# Wildlife dataset

WildlifeDataset is a class for creating pytorch style datasets by integration of datasets provided by wildlife-datasets library. It has implemented \_\_len\_\_ and \_\_getattr\_\_ methods, which allows using pytorch dataloaders for training and inference.
WildlifeDataset is a class for creating Pytorch style image datasets and allows integration of datasets provided by `wildlife-datasets` library. It has implemented `__len__` and `__getattr__` methods, which allows using standard Pytorch dataloaders for training and inference.


## Metadata dataframe
Integral part of WildlifeDataset is metadata dataframe, which includes all information about images in the dataset.
Key part of WildlifeDataset is metadata dataframe, which includes all information about images in the dataset.
Typical dataset from the wildlife-dataset have following metadata table:


Expand Down Expand Up @@ -49,9 +55,3 @@ image, label = dataset[0]
```


## Reference
::: data.dataset.WildlifeDataset
options:
show_symbol_type_heading: false
show_bases: false
show_root_toc_entry: false
65 changes: 24 additions & 41 deletions docs/features.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,46 @@
# Feature extraction
Feature extractors offers a standardized way to extract features from instances of the `WildlifeDataset`.

Feature extractors, implemented as classes, can be created with specific arguments that define the extraction properties. After instantiation, the extractor functions as a callable, requiring only a single argument—the `WildlifeDataset` instance. The specific output type and shape vary based on the chosen feature extractor. In general, the output is iterable, with the first dimension corresponding to the size of the `WildlifeDataset` input.
Feature extractors, implemented as classes, can be created with specific arguments that define the extraction properties. After instantiation, the extractor functions as a callable, requiring only a single argument—the `WildlifeDataset` instance. The specific output type and shape vary based on the chosen feature extractor. Output is `FeatureDataset` instance.

## Deep features

::: features.deep
options:
show_root_heading: true
heading_level: 2

The `DeepFeatures` extractor operates by extracting features through the forward pass of a PyTorch model. The output is a 2D array, where the rows represent images, and the columns correspond to the embedding dimensions. The size of the columns is determined by the output size of the model performing the feature extraction.

### Example
The term `dataset` refers to any instance of WildlifeDataset with transforms that convert it into a tensor with the appropriate shape.

```Python
import timm
from wildlife_tools.features import DeepFeatures
::: features.local
options:
show_root_heading: true
heading_level: 2

backbone = timm.create_model('hf-hub:BVRA/MegaDescriptor-T-224', num_classes=0, pretrained=True)
extractor = DeepFeatures(backbone, device='cuda')
features = extractor(dataset)
```

### Reference
::: features.deep.DeepFeatures
::: features.memory
options:
show_symbol_type_heading: false
show_bases: false
show_root_toc_entry: false

show_root_heading: true
heading_level: 2


## SIFT features
The `SIFTFeatures` extractor retrieves a set of SIFT descriptors for each provided image. The output is a list with a length of `n_inputs`, containing arrays. These arrays are 2D with a shape of `n_descriptors` x `128`, where the value of `n_descriptors` depends on the number of SIFT descriptors extracted for the specific image. If one or less descriptors are extracted, the value is None. The SIFT implementation from OpenCV is used.
## Examples

### Example
The term `dataset` refers to any instance of WildlifeDataset with transforms that convert it into grayscale PIL image.
### Example - SuperPoint features

```Python
from wildlife_tools.features import SIFTFeatures
from wildlife_tools.features.local import SuperPointExtractor

extractor = SIFTFeatures()
extractor = SuperPointExtractor(backend='opencv', detection_threshold=0.0, force_num_keypoints=True, max_num_keypoints=256)
features = extractor(dataset)
```


### Reference
::: features.sift.SIFTFeatures
options:
show_symbol_type_heading: false
show_bases: false
show_root_toc_entry: false


### Example - Deep features

## Data to memory

The `DataToMemory` extractor loads the `WildlifeDataset` into memory. This is particularly usefull for the `LoftrMatcher`, which operates directly with image tensors. While it is feasible to directly use the `WildlifeDataset` and load images from storage dynamically, the `LoftrMatcher` lacks a loading buffer. Consequently, loading images on the fly could become a significant bottleneck, especially when matching all query-database pairs, involving `n_query` x `n_database` image loads.
```Python
import timm
from wildlife_tools.features.deep import DeepFeatures

::: features.memory.DataToMemory
options:
show_symbol_type_heading: false
show_bases: false
show_root_toc_entry: false
backbone = timm.create_model('hf-hub:BVRA/MegaDescriptor-T-224', num_classes=0, pretrained=True)
extractor = DeepFeatures(backbone, device='cuda')
features = extractor(dataset)
```
Loading

0 comments on commit f5fd69e

Please sign in to comment.