everycoffee-ingest

Python CLI for the Every Coffee multi-source cafe ingestion pipeline.

Features

Ingest cafe records from OSM PBF, Overture GeoParquet, and Foursquare CSV
Normalize and deduplicate with geohash blocking + similarity scoring
Parse OSM opening_hours into structured per-day rows
Compute specialty_score from weighted specialty signals
Track ingestion runs in Supabase/Postgres

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env

Optional (only if you need libpostal-backed address parsing for dedupe normalization):

pip install -e ".[postal]"

CLI Commands

everycoffee-ingest osm-import --pbf ./data/planet.osm.pbf --region global
everycoffee-ingest overture-import --parquet ./data/overture.parquet --region global
everycoffee-ingest foursquare-import --csv ./data/foursquare.csv --region global
everycoffee-ingest stockist-import --roaster-id <uuid> --url https://example.com/find-us
everycoffee-ingest dedupe --source osm --region global --dry-run
everycoffee-ingest dedupe --source osm --region global --apply
everycoffee-ingest enrich-hours --source osm --region global
everycoffee-ingest enrich-osm --region global
everycoffee-ingest enrich-specialty --recompute-all
everycoffee-ingest status

All commands return structured JSON so automation can parse outputs reliably.

Environment

See .env.example for required variables.

Trial And Error Runbook

Use this staged path for first real-world validation. Start with small files, then expand.

1) Prepare Small Trial Inputs

OSM: city-level .osm.pbf extract
Overture: regional GeoParquet sample
Foursquare: CSV subset (~500 to 5,000 rows)

mkdir -p ./trial-data
# put files into ./trial-data/osm.pbf ./trial-data/overture.parquet ./trial-data/foursquare.csv

2) Run Ingestion Sources

everycoffee-ingest osm-import --pbf ./trial-data/osm.pbf --region trial
everycoffee-ingest overture-import --parquet ./trial-data/overture.parquet --region trial
everycoffee-ingest foursquare-import --csv ./trial-data/foursquare.csv --region trial

3) Dedupe In Safe Mode First

everycoffee-ingest dedupe --region trial --dry-run

Confirm output fields look sane before applying:

compared_pairs
accepted_pairs
clusters

4) Apply Dedupe Merges

everycoffee-ingest dedupe --region trial --apply

Review:

persisted_matches
merges_attempted
merges_applied

5) Run Enrichment

everycoffee-ingest enrich-hours --source osm --region trial
everycoffee-ingest enrich-osm --region trial
everycoffee-ingest enrich-specialty --recompute-all

6) Verify Run Health

everycoffee-ingest status --limit 20

7) Iterate Until Stable

Use trial metrics to guide fixes:

Retry recovery: transient errors should recover without aborting a full run
Bad-row isolation: malformed source rows should increment failures/skips, not kill command
Merge behavior: repeated --apply should remain idempotent
Data quality: accepted match volume should be plausible for dataset size

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/everycoffee_ingest		src/everycoffee_ingest
tests		tests
trial-data		trial-data
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

everycoffee-ingest

Features

Quickstart

CLI Commands

Environment

Trial And Error Runbook

1) Prepare Small Trial Inputs

2) Run Ingestion Sources

3) Dedupe In Safe Mode First

4) Apply Dedupe Merges

5) Run Enrichment

6) Verify Run Health

7) Iterate Until Stable

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

everycoffee-ingest

Features

Quickstart

CLI Commands

Environment

Trial And Error Runbook

1) Prepare Small Trial Inputs

2) Run Ingestion Sources

3) Dedupe In Safe Mode First

4) Apply Dedupe Merges

5) Run Enrichment

6) Verify Run Health

7) Iterate Until Stable

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages