Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 58 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,22 @@
# Viam Torchvision Module


This is a [Viam module](https://docs.viam.com/extend/modular-resources/) providing a model of vision service for [TorchVision's New Multi-Weight Support API](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/).
This is a [Viam module](https://docs.viam.com/extend/modular-resources/) providing a model of [vision service](https://docs.viam.com/services/vision/#api) for [TorchVision's New Multi-Weight Support API](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/).
<p align="center">
<img src="https://pytorch.org/assets/images/torchvision_gif.gif" width=80%, height=70%>
</p>

<img src="https://pytorch.org/assets/images/torchvision_gif.gif" width=80%, height=70%>
</p>

For a given model architecture (e.g. *ResNet50*), multiple weights can be available and each of those weights comes with Metadata (preprocessing and labels).
For a given model architecture (e.g. *ResNet50*), multiple weights can be available. Each of those weights comes with preprocessing and label metadata.

## Getting started

First, [create a machine](https://docs.viam.com/how-tos/configure/) in Viam.

To use this module, follow these instructions to [add a module from the Viam Registry](https://docs.viam.com/modular-resources/configure/#add-a-module-from-the-viam-registry) and select the `viam:vision:torchvision` model from the [`torchvision` module](https://app.viam.com/module/viam/torchvision).

Navigate to the [**CONFIGURE** tab](https://docs.viam.com/configure/) of your [machine](https://docs.viam.com/fleet/machines/) in the [Viam app](https://app.viam.com/).

[Add vision / torchvision to your machine](https://docs.viam.com/configure/#components).

Depending on the type of models configured, the module implements:

- For detectors:
Expand All @@ -22,20 +27,58 @@ Depending on the type of models configured, the module implements:
- `GetClassifications()`
- `GetClassificationsFromCamera()`

> [!NOTE]
>See [vision service API](https://docs.viam.com/services/vision/#api) for more details.
## viam:vision:torchvision

## Configure your `torchvision` vision service
To configure the `torchvision` model, use the following template:

> [!NOTE]
> Before configuring your vision service, you must [create a machine](https://docs.viam.com/how-tos/configure/).
```json
{
"model_name": <string>,
"labels_confidences": {
<label1>: <float>,
<label2>: <float>
},
"default_minimum_confidence": <float>
}
```

Navigate to the [**CONFIGURE** tab](https://docs.viam.com/configure/) of your [machine](https://docs.viam.com/fleet/machines/) in the [Viam app](https://app.viam.com/).
[Add vision / torchvision to your machine](https://docs.viam.com/configure/#components).
### Attributes

The only **required attribute** to configure your torchvision vision service is a `model_name`:

### Example configuration with a camera and transform camera

| Name | Type | Inclusion | Default | Description |
| ------------ | ------ | ------------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model_name` | string | **Required** | | Vision model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. |


### Optional attributes

| Name | Type | Inclusion | Default | Description |
| ---------------------------- | --------------------- | --------- | ----------- | ----------- |
| `weights` | string | Optional | `DEFAULT` | Weights model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. |
| `default_minimum_confidence` | float | Optional | | Default minimum confidence for filtering all labels that are not specified in `label_confidences`. |
| `labels_confidences` | dict[str, float] | Optional | | Dictionary specifying minimum confidence thresholds for specific labels. Example: `{"grasshopper": 0.5, "cricket": 0.45}`. If a label has a confidence set lower that `default_minimum_confidence`, that confidence over-writes the default for the specified label if `labels_confidences` is left blank, no filtering on labels will be applied. |
| `use_weight_transform` | bool | Optional | True | Loads preprocessing transform from weights metadata. |
| `input size` | List[int] | Optional | `None` | Resize the image. Overides resize from weights metadata. |
| `mean_rgb` | [float, float, float] | Optional | `[0, 0, 0]` | Specifies the mean and standard deviation values for normalization in RGB order. |
| `std_rgb` | [float, float, float] | Optional | `[1, 1, 1]` | Specifies the standard deviation values for normalization in RGB order. |
| `swap_r_and_b` | bool | Optional | `False` | If True, swaps the R and B channels in the input image. Use this if the images passed as inputs to the model are in the OpenCV format. |
| `channel_last` | bool | Optional | `False` | If True, the image tensor will be converted to channel-last format. Default is False. |

### Preprocessing transforms behavior and **order**:

- If there are a transform in the metadata of the weights and `use_weight_transform` is True, `weights_transform` is added to the pipeline.
- If `input_size` is provided, the image is resized using `v2.Resize()` to the specified size.
- If both mean and standard deviation values are provided in `normalize`, the image is normalized using `v2.Normalize()` with the specified mean and standard deviation values.
- If `swap_R_and_B` is set to `True`, first and last channel are swapped.
- If `channel_last` is `True`, a transformation is applied to convert the channel order to the last dimension format. (C, H ,W) -> (H, W, X).


#### Full example configuration

The following JSON config file includes the following resources:

- TorchVision module
- modular resource (TorchVision vision service)
- a [webcam camera](https://docs.viam.com/components/camera/webcam/)
Expand Down Expand Up @@ -99,40 +142,7 @@ The following JSON config file includes the following resources:
}
```

### Resources

### Attributes description

The only **required attribute** to configure your torchvision vision service is a `model_name`:


| Name | Type | Inclusion | Default | Description |
| ------------ | ------ | ------------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model_name` | string | **Required** | | Vision model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. |



## Supplementaries
### Optional config attributes
| Name | Type | Inclusion | Default | Description |
| ---------------------------- | --------------------- | --------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `weights` | string | Optional | `DEFAULT` | Weights model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. |
| `default_minimum_confidence` | float | Optional | | Default minimum confidence for filtering all labels that are not specified in `label_confidences`. |
| `labels_confidences` | dict[str, float] | Optional | | Dictionary specifying minimum confidence thresholds for specific labels. Example: `{"grasshopper": 0.5, "cricket": 0.45}`. If a label has a confidence set lower that `default_minimum_confidence`, that confidence over-writes the default for the specified label if `labels_confidences` is left blank, no filtering on labels will be applied. |
| `use_weight_transform` | bool | Optional | True | Loads preprocessing transform from weights metadata. |
| `input size` | List[int] | Optional | `None` | Resize the image. Overides resize from weights metadata. |
| `mean_rgb` | [float, float, float] | Optional | `[0, 0, 0]` | Specifies the mean and standard deviation values for normalization in RGB order |
| `std_rgb` | [float, float, float] | Optional | `[1, 1, 1]` | Specifies the standard deviation values for normalization in RGB order. |
| `swap_r_and_b` | bool | Optional | `False` | If True, swaps the R and B channels in the input image. Use this if the images passed as inputs to the model are in the OpenCV format. |
| `channel_last` | bool | Optional | `False` | If True, the image tensor will be converted to channel-last format. Default is False. |
### Preprocessing transforms behavior and **order**:
- If there are a transform in the metadata of the weights and `use_weight_transform` is True, `weights_transform` is added to the pipeline.
- If `input_size` is provided, the image is resized using `v2.Resize()` to the specified size.
- If both mean and standard deviation values are provided in `normalize`, the image is normalized using `v2.Normalize()` with the specified mean and standard deviation values.
- If `swap_R_and_B` is set to `True`, first and last channel are swapped.
- If `channel_last` is `True`, a transformation is applied to convert the channel order to the last dimension format. (C, H ,W) -> (H, W, X).



### RESOURCES
- [Table of all available classification weights](https://pytorch.org/vision/main/models.html#table-of-all-available-classification-weights)
- [Quantized models](https://pytorch.org/vision/main/models.html#quantized-models)
4 changes: 3 additions & 1 deletion meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
"models": [
{
"api": "rdk:service:vision",
"model": "viam:vision:torchvision"
"model": "viam:vision:torchvision",
"short_description": "Service wrapper for the torchvision computer vision library.",
"markdown_link": "README.md#viamvisiontorchvision"
}
],
"build": {
Expand Down