Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,9 @@ Suggests:
testthat,
coro,
R.matlab,
xml2
xml2,
knitr,
rmarkdown
BugReports: https://github.com/mlverse/torchvision/issues
Collate:
'folder-dataset.R'
Expand Down Expand Up @@ -88,8 +90,8 @@ Collate:
'imagenet.R'
'models-alexnet.R'
'models-convnext.R'
'models-convnext_segmentation.R'
'models-convnext_detection.R'
'models-convnext_segmentation.R'
'models-deeplabv3.R'
'models-efficientnet.R'
'models-efficientnetv2.R'
Expand All @@ -106,6 +108,7 @@ Collate:
'models-vit.R'
'ops-box_convert.R'
'ops-boxes.R'
'rf100-catalog.R'
'tiny-imagenet-dataset.R'
'transforms-array.R'
'transforms-defaults.R'
Expand All @@ -114,3 +117,7 @@ Collate:
'transforms-tensor.R'
'utils.R'
'vision_utils.R'
Depends:
R (>= 3.5)
LazyData: true
VignetteBuilder: knitr
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,14 @@ export(flickr30k_caption_dataset)
export(flickr8k_caption_dataset)
export(flowers102_dataset)
export(generalized_box_iou)
export(get_rf100_catalog)
export(image_folder_dataset)
export(imagenet_classes)
export(imagenet_label)
export(kmnist_dataset)
export(lfw_pairs_dataset)
export(lfw_people_dataset)
export(list_rf100_datasets)
export(magick_loader)
export(mnist_dataset)
export(model_alexnet)
Expand Down Expand Up @@ -191,6 +193,7 @@ export(rf100_infrared_collection)
export(rf100_medical_collection)
export(rf100_peixos_segmentation_dataset)
export(rf100_underwater_collection)
export(search_rf100)
export(tensor_image_browse)
export(tensor_image_display)
export(tiny_imagenet_dataset)
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# torchvision (development version)

## New features

* Added RF100 dataset catalog with `search_rf100()`, `get_rf100_catalog()`, and `list_rf100_datasets()` functions for discovering and exploring RoboFlow 100 datasets (#271, @ANAMASGARD).

## New models

* Added `model_convnext_*_detection()` for object detection, with * within tiny/small/base (#262, @ANAMASGARD).
Expand Down
2 changes: 1 addition & 1 deletion R/collection-rf100-biology.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ NULL

#' RoboFlow 100 Biology dataset Collection
#'
#' Loads one of the [RoboFlow 100 Biology](https://universe.roboflow.com/browse/biology) datasets with COCO-style
#' Loads one of the [RoboFlow 100 Biology](https://universe.roboflow.com/browse/biology) datasets with
#' bounding box annotations for object detection tasks.
#'
#' @inheritParams rf100_document_collection
Expand Down
2 changes: 1 addition & 1 deletion R/collection-rf100-damage.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ NULL

#' RoboFlow 100 Damages dataset Collection
#'
#' Loads one of the [RoboFlow 100 Damage & Risk assesment](https://universe.roboflow.com/browse/damage-risk) datasets with COCO-style
#' Loads one of the [RoboFlow 100 Damage & Risk assesment](https://universe.roboflow.com/browse/damage-risk) datasets with
#' bounding box annotations for object detection tasks.
#'
#' @inheritParams rf100_document_collection
Expand Down
4 changes: 2 additions & 2 deletions R/collection-rf100-doc.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ NULL

#' RoboFlow 100 Document dataset Collection
#'
#' Loads one of the [RoboFlow 100 Document](https://universe.roboflow.com/browse/documents) datasets with COCO-style
#' Loads one of the [RoboFlow 100 Document](https://universe.roboflow.com/browse/documents) datasets with
#' bounding box annotations for object detection tasks.
#'
#' @param dataset Dataset to select within \code{c("tweeter_post", "tweeter_profile", "document_part",
Expand All @@ -14,7 +14,7 @@ NULL
#' @param target_transform Optional transform function applied to the target.
#'
#' @return A torch dataset. Each element is a named list with:
#' - `x`: H x W x 3 array representing the image.
#' - `x`: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
#' - `y`: a list containing the target with:
#' - `image_id`: numeric identifier of the x image.
#' - `labels`: numeric identifier of the N bounding-box object class.
Expand Down
4 changes: 2 additions & 2 deletions R/collection-rf100-infrared.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ NULL

#' RoboFlow 100 Infrared dataset Collection
#'
#' Loads one of the [RoboFlow 100 Infrared](https://universe.roboflow.com/browse/infrared) datasets (COCO
#' format) with per-dataset folders and train/valid/test splits.
#' Loads one of the [RoboFlow 100 Infrared](https://universe.roboflow.com/browse/infrared) datasets
#' with per-dataset folders and train/valid/test splits.
#'
#' @inheritParams rf100_document_collection
#' @param dataset Dataset to select within \code{c("thermal_dog_and_people", "solar_panel", "thermal_cheetah", "ir_object")}.
Expand Down
4 changes: 2 additions & 2 deletions R/collection-rf100-medical.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ NULL

#' RoboFlow 100 Medical dataset Collection
#'
#' Loads one of the [RoboFlow 100 Medical](https://universe.roboflow.com/browse/medical) datasets (COCO
#' format) with per-dataset folders and train/valid/test splits.
#' Loads one of the [RoboFlow 100 Medical](https://universe.roboflow.com/browse/medical) datasets
#' with per-dataset folders and train/valid/test splits.
#'
#' @inheritParams rf100_document_collection
#' @param dataset Dataset to select within \code{c("radio_signal",
Expand Down
2 changes: 1 addition & 1 deletion R/globals.R
Original file line number Diff line number Diff line change
@@ -1 +1 @@
utils::globalVariables(c("..", "N", "magick_loader", "self"))
utils::globalVariables(c("..", "N", "magick_loader", "self", "rf100_catalog"))
176 changes: 176 additions & 0 deletions R/rf100-catalog.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
#' RF100 Dataset Catalog
#'
#' A comprehensive catalog of all RF100 (RoboFlow 100) datasets available in torchvision.
#' This data frame contains metadata about each dataset including descriptions, sizes,
#' available splits, and collection information.
#'
#' @format A data frame with datasets as rows and 16 columns:
#' \describe{
#' \item{collection}{Collection name (biology, medical, infrared, damage, underwater, document)}
#' \item{dataset}{Dataset identifier used in collection functions}
#' \item{description}{Brief description of the dataset and its purpose}
#' \item{task}{Machine learning task type (currently all "object_detection")}
#' \item{num_images}{Total number of images across all splits}
#' \item{image_width}{Typical image width in pixels}
#' \item{image_height}{Typical image height in pixels}
#' \item{train_size_mb}{Size of training split in megabytes}
#' \item{test_size_mb}{Size of test split in megabytes}
#' \item{valid_size_mb}{Size of validation split in megabytes}
#' \item{total_size_mb}{Total size across all splits in megabytes}
#' \item{has_train}{Logical indicating if training split is available}
#' \item{has_test}{Logical indicating if test split is available}
#' \item{has_valid}{Logical indicating if validation split is available}
#' \item{function_name}{R function name to load this dataset's collection}
#' \item{roboflow_url}{URL to the collection on RoboFlow Universe}
#' }
#'
#' @examples
#' \dontrun{
#' # View the complete catalog
#' data(rf100_catalog)
#' View(rf100_catalog)
#'
#' # See all biology datasets
#' subset(rf100_catalog, collection == "biology")
#'
#' # Find large datasets (> 100 MB)
#' subset(rf100_catalog, total_size_mb > 100)
#' }
#'
#' @seealso [search_rf100()], [get_rf100_catalog()]
"rf100_catalog"

#' Search RF100 Dataset Catalog
#'
#' Search through all RF100 datasets by keywords in name or description,
#' or filter by collection. This makes it easy to discover datasets relevant
#' to your task without browsing each collection individually.
#'
#' @param keyword Character string to search for (case-insensitive). Searches
#' in both dataset names and descriptions. If NULL, returns all datasets
#' (optionally filtered by collection).
#' @param collection Filter by collection name. One of: "biology", "medical",
#' "infrared", "damage", "underwater", "document". If NULL, searches all collections.
#'
#' @return A data frame with matching datasets and their metadata. Returns NULL
#' invisibly if no matches are found.
#'
#' @examples
#' \dontrun{
#' # Find all medical datasets
#' search_rf100(collection = "medical")
#'
#' # Find datasets about cells
#' search_rf100("cell")
#'
#' # Find photovoltaic/solar datasets
#' search_rf100("solar")
#' search_rf100("photovoltaic")
#'
#' # Find all biology datasets with "cell" in name/description
#' search_rf100("cell", collection = "biology")
#'
#' # List all available datasets
#' search_rf100()
#' }
#'
#' @seealso [get_rf100_catalog()], [rf100_catalog]
#' @export
search_rf100 <- function(keyword = NULL, collection = NULL) {
# Load the catalog
utils::data("rf100_catalog", package = "torchvision", envir = environment())

result <- rf100_catalog

# Filter by collection first
if (!is.null(collection)) {
valid_collections <- c("biology", "medical", "infrared", "damage", "underwater", "document")
if (!collection %in% valid_collections) {
stop("Invalid collection. Must be one of: ", paste(valid_collections, collapse = ", "))
}
result <- result[result$collection == collection, ]
}

# Filter by keyword in dataset name or description
if (!is.null(keyword)) {
pattern <- tolower(keyword)
matches <- grepl(pattern, tolower(result$dataset)) |
grepl(pattern, tolower(result$description))
result <- result[matches, ]
}

if (nrow(result) == 0) {
message("No datasets found matching criteria")
return(invisible(NULL))
}

# Reset row names for cleaner display
rownames(result) <- NULL

result
}

#' Get Complete RF100 Catalog
#'
#' Returns the complete catalog of all RF100 datasets with their metadata.
#' This is a convenience function that loads and returns the rf100_catalog data.
#'
#' @return A data frame with all RF100 datasets and their metadata.
#'
#' @examples
#' \dontrun{
#' # Get complete catalog
#' catalog <- get_rf100_catalog()
#'
#' # View in RStudio
#' View(catalog)
#'
#' # Summary statistics
#' summary(catalog$total_size_mb)
#' table(catalog$collection)
#'
#' # Find smallest dataset
#' catalog[which.min(catalog$total_size_mb), ]
#'
#' # Find largest dataset
#' catalog[which.max(catalog$total_size_mb), ]
#' }
#'
#' @seealso [search_rf100()], [rf100_catalog]
#' @export
get_rf100_catalog <- function() {
utils::data("rf100_catalog", package = "torchvision", envir = environment())
rf100_catalog
}

#' List Datasets in an RF100 Collection
#'
#' List all available datasets within a specific RF100 collection.
#'
#' @param collection Collection name. One of: "biology", "medical", "infrared",
#' "damage", "underwater", "document".
#'
#' @return Character vector of dataset names in the collection.
#'
#' @examples
#' \dontrun{
#' # List all biology datasets
#' list_rf100_datasets("biology")
#'
#' # List all medical datasets
#' list_rf100_datasets("medical")
#' }
#'
#' @seealso [search_rf100()], [get_rf100_catalog()]
#' @export
list_rf100_datasets <- function(collection) {
valid_collections <- c("biology", "medical", "infrared", "damage", "underwater", "document")
if (!collection %in% valid_collections) {
stop("Invalid collection. Must be one of: ", paste(valid_collections, collapse = ", "))
}

utils::data("rf100_catalog", package = "torchvision", envir = environment())
datasets <- rf100_catalog[rf100_catalog$collection == collection, "dataset"]
sort(datasets)
}

51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,54 @@ You can install the development version from GitHub with:
remotes::install_github("mlverse/torchvision@main")
```

## RF100 Dataset Catalog

torchvision includes 34 datasets from the RoboFlow 100 benchmark, organized into 6 collections. Use the catalog to easily discover and search for datasets:

```r
library(torchvision)

# Search for datasets by keyword
search_rf100("solar") # Find solar/photovoltaic datasets
search_rf100("cell") # Find cell-related datasets
search_rf100("medical") # Find medical imaging datasets

# Browse by collection
search_rf100(collection = "biology") # All biology datasets
search_rf100(collection = "medical") # All medical datasets

# View complete catalog
catalog <- get_rf100_catalog()
View(catalog)
```

### Available Collections

- **Biology** (9 datasets): Microscopy, cells, bacteria, parasites, plant diseases
- **Medical** (8 datasets): X-rays, MRI, pathology, tumor detection
- **Infrared** (4 datasets): Thermal imaging, FLIR cameras, solar panels
- **Damage** (3 datasets): Infrastructure damage, defect detection
- **Underwater** (4 datasets): Marine life, coral reefs, underwater objects
- **Document** (6 datasets): OCR, document parsing, diagrams

See `vignette("rf100-datasets")` for the complete catalog and detailed information.

### Example Usage

```r
# Search for a dataset
search_rf100("blood")

# Load the dataset
ds <- rf100_biology_collection(
dataset = "blood_cell",
split = "train",
download = TRUE
)

# Visualize a sample
item <- ds[1]
boxed <- draw_bounding_boxes(item)
tensor_image_browse(boxed)
```

Loading
Loading