Skip to content

MrT3313/Parse-ImageNet

Repository files navigation

ParseImageNet

Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.

PyPI Version Python Version License Downloads

Prerequisites

  • Python 3.8+
  • ImageNet dataset (or a subset) with the standard ILSVRC directory structure:
    ImageNet-Subset/
    ├── LOC_synset_mapping.txt
    ├── LOC_val_solution.csv
    └── ILSVRC/
        ├── ImageSets/
        │   └── CLS-LOC/
        │       ├── train_cls.txt
        │       └── val.txt
        └── Data/
            └── CLS-LOC/
                ├── train/
                │   ├── n01440764/
                │   │   ├── n01440764_10026.JPEG
                │   │   └── ...
                │   └── ...
                └── val/
                    ├── ILSVRC2012_val_00000001.JPEG
                    └── ...
    

Installation

pip install parseimagenet

For local development:

git clone https://github.com/MrT3313/Parse-ImageNet.git
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet

Usage

Params

Parameter Type Default Alternatives Description
base_path Path - Any valid directory path Root path to the ImageNet dataset
preset str or None None "birds", "dogs", ... via get_available_presets() Predefined keyword list. None selects all categories
keywords list or None None Any list of strings Custom keyword list. Overrides preset when provided
num_images int 200 Any positive integer Max images to return (capped by availability)
source str "train" "val" Data split to sample from
silent bool True False Suppresses print output when enabled

Base Example

from pathlib import Path
from parseimagenet import get_image_paths_by_keywords

# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset

# Default: no preset, selects from all categories
image_paths = get_image_paths_by_keywords(base_path=base_path)

# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])

Using Presets

Note

Presets are predefined keyword lists for common categories:

from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers

# See available presets
print(get_available_presets())  # ['birds', 'dogs', 'wild_canids', 'snakes']

# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])

# Use a specific preset
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=200
)

Using Keywords

Note

Custom keywords override the preset:

Important

you can find all applicable category keywords in the LOC_synset_mapping.txt file

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    keywords=['dog', 'puppy', 'hound'],
    num_images=100
)

Using Sources

By default, images are sourced from the training set. Use source="val" to pull from the validation set instead:

Important

we do not provide a fetch from the test data because the Kaggle Competition Dataset does not provide the ground truth for the training data.

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=100,
    source="val"
)

Command Line

# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset

# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100

# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords "dog, puppy" --num_images 100

# Use validation data instead of training data
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --source val --num_images 100

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published