ParseImageNet

Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.

Kaggle Competition Dataset

Prerequisites

Python 3.8+

ImageNet dataset (or a subset) with the standard ILSVRC directory structure:

ImageNet-Subset/
├── LOC_synset_mapping.txt
├── LOC_val_solution.csv
└── ILSVRC/
    ├── ImageSets/
    │   └── CLS-LOC/
    │       ├── train_cls.txt
    │       └── val.txt
    └── Data/
        └── CLS-LOC/
            ├── train/
            │   ├── n01440764/
            │   │   ├── n01440764_10026.JPEG
            │   │   └── ...
            │   └── ...
            └── val/
                ├── ILSVRC2012_val_00000001.JPEG
                └── ...

Installation

pip install parseimagenet

For local development:

git clone https://github.com/MrT3313/Parse-ImageNet.git
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet

Usage

Note

Example Notebook

Params

Parameter	Type	Default	Alternatives	Description
`base_path`	`Path`	-	Any valid directory path	Root path to the ImageNet dataset
`preset`	`str` or `None`	`None`	`"birds"`, `"dogs"`, ... via `get_available_presets()`	Predefined keyword list. `None` selects all categories
`keywords`	`list` or `None`	`None`	Any list of strings	Custom keyword list. Overrides `preset` when provided
`num_images`	`int`	`200`	Any positive integer	Max images to return (capped by availability)
`source`	`str`	`"train"`	`"val"`	Data split to sample from
`silent`	`bool`	`True`	`False`	Suppresses print output when enabled

Base Example

from pathlib import Path
from parseimagenet import get_image_paths_by_keywords

# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset

# Default: no preset, selects from all categories
image_paths = get_image_paths_by_keywords(base_path=base_path)

# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])

Using Presets

Note

Presets are predefined keyword lists for common categories:

from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers

# See available presets
print(get_available_presets())  # ['birds', 'dogs', 'wild_canids', 'snakes']

# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])

# Use a specific preset
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=200
)

Using Keywords

Note

Custom keywords override the preset:

Important

you can find all applicable category keywords in the LOC_synset_mapping.txt file

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    keywords=['dog', 'puppy', 'hound'],
    num_images=100
)

Using Sources

By default, images are sourced from the training set. Use source="val" to pull from the validation set instead:

Important

we do not provide a fetch from the test data because the Kaggle Competition Dataset does not provide the ground truth for the training data.

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=100,
    source="val"
)

Command Line

# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset

# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100

# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords "dog, puppy" --num_images 100

# Use validation data instead of training data
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --source val --num_images 100

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
.idea/runConfigurations		.idea/runConfigurations
DOCS		DOCS
parseimagenet		parseimagenet
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParseImageNet

Kaggle Competition Dataset

Prerequisites

Installation

Usage

Params

Base Example

Using Presets

Using Keywords

Using Sources

Command Line

About

Uh oh!

Releases

Packages

Languages

License

MrT3313/Parse-ImageNet

Folders and files

Latest commit

History

Repository files navigation

ParseImageNet

Kaggle Competition Dataset

Prerequisites

Installation

Usage

Params

Base Example

Using Presets

Using Keywords

Using Sources

Command Line

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages