m-lab · Mario5T · Feb 19, 2026 · Feb 19, 2026 · Feb 19, 2026 · Feb 19, 2026
@@ -1,115 +1,129 @@
 # Internet Quality Barometer (IQB)
 
-[![Build Status](https://github.com/m-lab/iqb/actions/workflows/ci.yml/badge.svg)](https://github.com/m-lab/iqb/actions) [![codecov](https://codecov.io/gh/m-lab/iqb/branch/main/graph/badge.svg)](https://codecov.io/gh/m-lab/iqb) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/m-lab/iqb)
+[![Build Status](https://github.com/m-lab/iqb/actions/workflows/ci.yml/badge.svg)](https://github.com/m-lab/iqb/actions)
+[![codecov](https://codecov.io/gh/m-lab/iqb/branch/main/graph/badge.svg)](https://codecov.io/gh/m-lab/iqb)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/m-lab/iqb)
 
-This repository contains the source for code the Internet Quality Barometer (IQB)
-library, and related applications and notebooks.
+IQB is an open-source framework developed by
+[Measurement Lab (M-Lab)](https://www.measurementlab.net/) that computes a
+composite Internet quality score across real-world use cases: web browsing,
+video streaming, video conferencing, audio streaming, online backup, and
+gaming. Unlike single-metric speed tests, IQB evaluates whether a connection
+meets the network requirements of each use case and aggregates the results
+into a single score between 0 and 1.
 
-## About IQB
+Read the conceptual background:
 
-IQB is an open-source project initiated by
-[Measurement Lab (M-Lab)](https://www.measurementlab.net/).
+- [M-Lab blog post](https://www.measurementlab.net/blog/iqb/)
+- [Detailed framework report (PDF)](https://www.measurementlab.net/publications/IQB_report_2025.pdf)
+- [Executive summary (PDF)](https://www.measurementlab.net/publications/IQB_executive_summary_2025.pdf)
+- [ACM IMC 2025 poster](https://arxiv.org/pdf/2509.19034)
 
-IQB is motivated by the need to redefine how we measure and understand Internet
-performance to keep pace with evolving technological demands and user
-expectations. IQB is a comprehensive framework for collecting data and
-calculating a composite score, the “IQB Score”, which reflects
-the quality of Internet experience. IQB takes a more holistic approach
-than “speed tests” and evaluates Internet performance across various
-use cases (web browsing, video streaming, online gaming, etc.),
-each with its own specific network requirements (latency, throughput, etc.).
+Live staging dashboard: [iqb.mlab-staging.measurementlab.net](https://iqb.mlab-staging.measurementlab.net/)
 
-Read more about the IQB framework in:
+---
 
-- M-Lab's [blog post](https://www.measurementlab.net/blog/iqb/).
+## Repository Structure
 
-- The IQB framework [detailed report](
-https://www.measurementlab.net/publications/IQB_report_2025.pdf) and
-[executive summary](
-https://www.measurementlab.net/publications/IQB_executive_summary_2025.pdf).
+| Directory | Description |
+|-----------|-------------|
+| `library/` | `mlab-iqb` Python package — scoring logic, cache API, data pipeline, CLI |
+| `prototype/` | Streamlit web dashboard (Phase 1 prototype) |
+| `analysis/` | Jupyter notebooks for research and experimentation |
+| `data/` | Pipeline configuration and local Parquet cache |
+| `docs/` | Documentation, design decision records, internals guide |
 
-- The IQB [poster](https://arxiv.org/pdf/2509.19034) at ACM IMC 2025.
+---
 
-## Repository Architecture
-
-### **`docs/`**
-
-Documentation, tutorials, design documents, and presentations.
+## Quick Start
 
-See [docs/README.md](docs/README.md) for details.
+### Requirements
 
-### **`library/`**
+- Python 3.13 (see `.python-version`)
+- [uv](https://astral.sh/uv) — install with `brew install uv` on macOS or
+  `sudo snap install astral-uv --classic` on Ubuntu
 
-The IQB library containing methods for calculating the IQB score and data collection.
+### Setup and Run
 
-See [library/README.md](library/README.md) for details.
+```bash
+# Clone the repository
+git clone git@github.com:m-lab/iqb.git
+cd iqb
 
-### **`prototype/`**
+# Install all workspace dependencies
+uv sync --dev
 
-A Streamlit web application for applying and parametrizing the IQB framework
-in different use cases.
+# Run the Streamlit prototype
+cd prototype
+uv run streamlit run Home.py
+```
 
-See [prototype/README.md](prototype/README.md) for how to run it locally.
+The dashboard will be available at `http://localhost:8501`.
 
-### **`analysis/`**
+### Using the Library
 
-Jupyter notebooks for exploratory data analysis, experimentation, and research.
+```python
+from iqb import IQBCache, IQBCalculator, IQBDatasetGranularity, IQBRemoteCache
 
-See [analysis/README.md](analysis/README.md) for more information.
+# Pull pre-computed data from GCS (requires gcloud auth)
+cache = IQBCache(remote_cache=IQBRemoteCache())
 
-### **`data/`**
+# Load monthly country-level data
+entry = cache.get_cache_entry(
+    start_date="2025-10-01",
+    end_date="2025-11-01",
+    granularity=IQBDatasetGranularity.COUNTRY,
+)
 
-Workspace containing the default pipeline configuration, the default cache directory,
-and instructions for generating new data using the pipeline.
+# Filter to a specific country and extract the median percentile
+p50 = entry.mlab.read_data_frame_pair(country_code="US").to_iqb_data(percentile=50)
 
-See [data/README.md](data/README.md) for details.
+# Calculate the IQB score
+score = IQBCalculator().calculate_iqb_score(data={"m-lab": p50.to_dict()})
+print(f"IQB score: {score:.3f}")
+```
 
-### **`.iqb`**
+See [`analysis/00-template.ipynb`](analysis/00-template.ipynb) for a complete
+walkthrough.
 
-Symbolic link to [data](data) that simplifies running the pipeline on Unix.
+### CLI
 
-## Data Flow
+```bash
+# Check which cache entries are available locally and remotely
+uv run iqb cache status
 
-The components above connect as follows:
+# Pull pre-computed data from GCS to the local cache
+uv run iqb cache pull -d data/
 
-```
-BigQuery → [iqb pipeline run] → local cache/ → [IQBCache] → [IQBCalculator] → scores
-                                      ↕
-                              [iqb cache pull/push] ↔ GCS
+# Run the pipeline to generate new data from BigQuery
+uv run iqb pipeline run -d data/
 ```
 
-The **pipeline** queries BigQuery for M-Lab NDT measurements and stores
-percentile summaries as Parquet files in the local cache. To avoid expensive
-re-queries, **`iqb cache pull`** can download pre-computed results from GCS
-instead. The **`IQBCache`** API reads cached data, and **`IQBCalculator`**
-applies quality thresholds and weights to produce IQB scores. The
-**prototype** and **analysis notebooks** both consume scores through
-these library APIs.
+---
 
-## Understanding the Codebase
+## Documentation
 
-- To learn **how the data pipeline works**, read the
-[internals guide](docs/internals/README.md) — it walks through queries,
-the pipeline, the remote cache, and the researcher API in sequence.
+| Document | Description |
+|----------|-------------|
+| [docs/architecture.md](docs/architecture.md) | System overview, data flow, component responsibilities, extensibility |
+| [docs/developer_guide.md](docs/developer_guide.md) | Local setup, adding metrics and pages, testing, contribution workflow |
+| [docs/user_guide.md](docs/user_guide.md) | IQB for consumers, policymakers, researchers, and ISPs |
+| [library/README.md](library/README.md) | Library API, testing, linting, type checking |
+| [prototype/README.md](prototype/README.md) | Running locally, Docker, Cloud Run deployment |
+| [data/README.md](data/README.md) | Pipeline commands, cache format, GCS configuration |
+| [analysis/README.md](analysis/README.md) | Notebook usage and conventions |
+| [docs/internals/](docs/internals/README.md) | Sequential guide to how the data pipeline works |
+| [docs/design/](docs/design/README.md) | Architecture decision records |
+| [CONTRIBUTING.md](CONTRIBUTING.md) | Development environment, VSCode setup, component workflows |
 
-- To understand **why specific technical decisions were made**, see the
-[design documents](docs/design/README.md) — architecture decision records
-covering cache design, data distribution, and more.
+---
 
-## Quick Start
+## Contributing
 
-```bash
-# Clone the repository
-git clone git@github.com:m-lab/iqb.git
-cd iqb
+Contributions are welcome. Please read [CONTRIBUTING.md](CONTRIBUTING.md) for
+development environment setup and [docs/developer_guide.md](docs/developer_guide.md)
+for guidance on adding metrics, use cases, and dashboard pages. All changes
+require passing tests, Ruff linting, and Pyright type checks before merge.
 
-# Sync all dependencies (creates .venv automatically)
-uv sync --dev
-
-# Run the Streamlit prototype
-cd prototype
-uv run streamlit run Home.py
-```
+---
 
-See [CONTRIBUTING.md](CONTRIBUTING.md) for full development environment
-setup, VSCode configuration, and component-specific workflows.
@@ -0,0 +1,183 @@
+# IQB Architecture
+
+## Overview
+
+The Internet Quality Barometer (IQB) is a framework for computing a composite
+score that reflects the quality of an Internet connection across a set of
+real-world use cases. The system is structured as a monorepo containing four
+distinct components: a scoring library, a Streamlit-based dashboard prototype,
+Jupyter notebooks for exploratory analysis, and a data workspace managing the
+pipeline that acquires and caches measurement data.
+
+The design enforces a clear separation between:
+
+- **Data acquisition** — querying BigQuery for M-Lab NDT measurements
+- **Scoring logic** — computing IQB scores from aggregated measurement data
+- **Visualization** — presenting scores and trends in an interactive dashboard
+
+This separation allows each layer to evolve independently. Researchers can work
+with the library API and notebooks without touching the dashboard; dashboard
+developers can consume pre-computed cached data without running the pipeline.
+
+---
+
+## Repository Structure
+
+```
+iqb/
+├── library/          # mlab-iqb Python package (scoring logic, cache API, pipeline)
+│   └── src/iqb/
+│       ├── calculator.py    # IQBCalculator: weighted use-case scoring
+│       ├── config.py        # IQB_CONFIG: default thresholds and weights
+│       ├── cache/           # IQBCache: read Parquet data from local cache
+│       ├── pipeline/        # IQBPipeline: query BigQuery, write Parquet
+│       ├── ghremote/        # IQBRemoteCache: sync with GCS remote cache
+│       ├── queries/         # SQL query templates
+│       └── cli/             # iqb command-line interface
+├── prototype/        # Streamlit dashboard (Phase 1 prototype)
+│   ├── Home.py              # Application entry point
+│   ├── pages/               # Additional Streamlit pages (e.g., IQB_Map.py)
+│   ├── utils/               # Calculation helpers, constants, data loaders
+│   ├── visualizations/      # Chart components (sunburst, maps)
+│   ├── cache/               # Static JSON cache files per country/ASN
+│   └── Dockerfile           # Container image for Cloud Run deployment
+├── analysis/         # Jupyter notebooks for research and experimentation
+│   ├── 00-template.ipynb    # Worked example using IQBCache + IQBCalculator
+│   └── .iqb/                # Symlink to data workspace inside analysis env
+├── data/             # Pipeline configuration and local Parquet cache
+│   ├── pipeline.yaml        # Query matrix (date ranges × granularities)
+│   ├── cache/               # Parquet files written by iqb pipeline run
+│   └── state/               # Remote cache manifest (ghremote)
+└── docs/             # Documentation, design docs, internals guides
+```
+
+---
+
+## Data Flow Pipeline
+
+```mermaid
+flowchart LR
+    BQ["BigQuery\n(M-Lab NDT measurements)"]
+    pipeline["iqb pipeline run\n(IQBPipeline)"]
+    parquet["Local Parquet Cache\ncache/v1/{start}/{end}/{granularity}/"]
+    gcs["GCS Remote Cache\n(mlab-sandbox-iqb-us-central1)"]
+    cache_api["IQBCache\n(read + filter data)"]
+    calc["IQBCalculator\n(binary scoring → weighted aggregation)"]
+    dashboard["Streamlit Prototype\n(prototype/Home.py)"]
+    notebooks["Jupyter Notebooks\n(analysis/)"]
+
+    BQ -->|"SQL queries via\npipeline.yaml matrix"| pipeline
+    pipeline -->|"Parquet + stats.json"| parquet
+    parquet <-->|"iqb cache push / pull"| gcs
+    parquet --> cache_api
+    cache_api -->|"IQBDataFramePair"| calc
+    calc -->|"IQB score (0–1)"| dashboard
+    calc -->|"IQB score (0–1)"| notebooks
+    gcs -->|"iqb cache pull"| parquet
+```
+
+### Pipeline Stages
+
+1. **Query** — `IQBPipeline` reads `pipeline.yaml` to determine the date
+   ranges and granularities (country, country_asn, subdivision1,
+   subdivision1_asn, city, city_asn) to query. It runs parameterised SQL
+   against BigQuery and writes `data.parquet` and `stats.json` per entry.
+
+2. **Cache** — Results are stored as Parquet files under
+   `cache/v1/{start_date}/{end_date}/{query_type}/`. The format supports
+   streaming reads and chunked row groups for memory efficiency.
+
+3. **Remote sync** — `IQBRemoteCache` (and its GitHub-backed variant
+   `IQBGitHubRemoteCache`) can push local results to GCS and pull
+   pre-computed files to avoid re-running expensive BigQuery queries.
+
+4. **Scoring** — `IQBCache` reads and filters Parquet data into an
+   `IQBDataFramePair`. The caller extracts a percentile (e.g., the 50th),
+   converts it to a flat measurement dict, and passes it to
+   `IQBCalculator.calculate_iqb_score()`.
+
+5. **Presentation** — The Streamlit prototype loads pre-computed JSON files
+   from `prototype/cache/` and calls the library to compute scores
+   interactively.
+
+---
+
+## Scoring Logic and Visualization Separation
+
+The scoring logic is entirely contained in `library/src/iqb/`:
+
+- `IQB_CONFIG` in `config.py` encodes use cases, network requirements,
+  per-requirement weights, binary thresholds, and dataset weights.
+- `IQBCalculator.calculate_iqb_score()` applies a two-level weighted
+  aggregation: first across datasets for each requirement (requirement
+  agreement score), then across requirements for each use case (use case
+  score), and finally across use cases (IQB score).
+- The result is a single float in [0, 1].
+
+The dashboard in `prototype/` depends on the library as a workspace member
+(`mlab-iqb`) but contains no scoring logic itself. All calculation helpers
+in `prototype/utils/calculation_utils.py` delegate to `IQBCalculator` and
+manipulate `IQB_CONFIG` purely for session-specific overrides (custom
+thresholds, user-adjusted weights).
+
+This boundary ensures that changes to the scoring methodology require updates
+only in the library and are automatically reflected in both the prototype and
+analysis notebooks without any additional code changes.
+
+---
+
+## Extensibility
+
+**Adding a new use case** — Extend `IQB_CONFIG` in `config.py` with a new
+key under `"use cases"`, defining its weight, network requirements, thresholds,
+and dataset weights. `IQBCalculator` picks up the new use case automatically
+without code changes.
+
+**Adding a new dataset** — Add the dataset name to the `"datasets"` sub-dict
+for each network requirement that the dataset covers. Set the weight (`"w"`)
+to 0 to include the dataset structurally while keeping it inactive; set it to
+a positive value to activate it. Ensure data for the dataset is available in
+`IQBCache` or supplied in the `data` dict passed to `calculate_iqb_score()`.
+
+**Adding a new metric (network requirement)** — Add the requirement key to
+each use case in `IQB_CONFIG` and implement the corresponding binary scoring
+branch in `IQBCalculator.calculate_binary_requirement_score()`.
+
+**Adding a new dashboard page** — Create a new `.py` file in
+`prototype/pages/`. Streamlit automatically discovers and adds it to the
+sidebar. Consume library APIs and `prototype/utils/` helpers rather than
+reimplementing scoring logic.
+
+---
+
+## Scalability Considerations
+
+- **Granularity matrix** — The `pipeline.yaml` matrix defines the full
+  Cartesian product of date ranges and granularities. Adding new time
+  periods or geographic granularities requires only a YAML change; no
+  code changes are needed.
+- **Parquet format** — Parquet supports columnar reads and predicate pushdown.
+  Filtering by `country_code` or `asn` before loading data avoids reading
+  full datasets into memory.
+- **GCS remote cache** — Pre-computed results can be shared across teams via
+  GCS without requiring anyone to run BigQuery queries. The manifest tracks
+  available entries, allowing incremental pulls.
+
+---
+
+## Performance Considerations
+
+- **Caching at the pipeline level** — `IQBPipeline` checks for existing
+  Parquet files before executing BigQuery queries to avoid redundant cloud
+  spend. `stats.json` records bytes billed and duration for auditability.
+- **Static JSON cache in prototype** — The dashboard reads pre-computed
+  JSON files from `prototype/cache/` rather than querying BigQuery at
+  runtime. This eliminates network latency and authentication requirements
+  for end users.
+- **Percentile aggregation** — Raw measurement data is pre-aggregated to
+  percentile summaries (e.g., p50, p90) before being stored. This reduces
+  data volume and allows the dashboard to operate without streaming large
+  raw datasets.
+- **Streamlit session state** — `session_state.py` caches computation
+  results within a user session to avoid recalculating scores on every
+  widget interaction.