Skip to content

Cache GEE catalog to avoid repeated network requests#80

Open
mihiarc wants to merge 1 commit intotaraskiba:mainfrom
mihiarc:fix/cache-gee-catalog
Open

Cache GEE catalog to avoid repeated network requests#80
mihiarc wants to merge 1 commit intotaraskiba:mainfrom
mihiarc:fix/cache-gee-catalog

Conversation

@mihiarc
Copy link
Contributor

@mihiarc mihiarc commented Feb 2, 2026

Summary

  • Add @lru_cache decorator to cache GEE catalog fetch, avoiding repeated network requests
  • Consolidate duplicated fetch_geojson() implementations into shared functions in common.py
  • Add helper functions for common catalog lookups

Problem

The GEE catalog was being fetched from GitHub multiple times during normal usage:

  1. On class instantiation (to populate dropdown)
  2. On dropdown value change
  3. On button click
  4. Inside load_gee_as_image() to determine dataset type

This resulted in 3-4 network requests per extraction operation, causing:

  • Slow widget initialization
  • Unnecessary network overhead
  • Potential GitHub rate limiting on heavy use

Solution

Added cached functions to common.py:

@lru_cache(maxsize=1)
def fetch_gee_catalog():
    """Fetch and cache the GEE catalog for the session lifetime."""
    ...

def get_gee_catalog_as_dict(key_field="title", value_field="id"):
    """Get catalog as dict for dropdown menus."""
    ...

def get_dataset_type(dataset_id):
    """Get dataset type (image, image_collection, etc)."""
    ...

def get_dataset_url(dataset_id):
    """Get documentation URL for a dataset."""
    ...

Files Changed

File Changes
skiba/common.py +92 lines - New cached catalog functions
skiba/point_extraction.py Refactored to use common functions
skiba/aggregated_point_extraction.py Refactored to use common functions
skiba/buffer_coordinates.py Refactored to use common functions
skiba/buffer_and_sample.py Refactored to use common functions
skiba/buffer_method.py Refactored to use common functions
skiba/geojson_buffering.py Refactored to use common functions

Net change: -125 lines (removed duplication)

Additional Improvements

Fixes #76

Previously, the GEE catalog was fetched from GitHub multiple times
during normal usage:
- On class instantiation (to populate dropdown)
- On dropdown value change
- On button click / load_gee_as_image()

This resulted in 3-4 network requests per extraction operation.

Changes:
- Add cached catalog functions to common.py using @lru_cache
- Add helper functions: get_gee_catalog_as_dict(), get_dataset_type(),
  get_dataset_url()
- Update all 6 modules to use the cached functions
- Remove duplicated fetch_geojson() implementations
- Add timeout handling for network requests

Benefits:
- Single network request per session (cached)
- Faster widget initialization
- Reduced code duplication (-125 lines net)
- Better error handling with timeout

Also fixes date filtering bug (same as taraskiba#75) in additional locations.

Fixes taraskiba#76
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: GEE catalog is fetched repeatedly on every operation

1 participant