Project Comet is a reproducible analysis pipeline for studying the cosmic microwave background (CMB) and CMB lensing using Planck satellite data. The aim is to provide a modern, open, and automatable framework for exploring cosmological signals, validating theoretical models, and benchmarking analysis workflows.
This project integrates:
- High-resolution Planck component maps (SMICA CMB map and lensing convergence map).
- NaMaster for pseudo-$C_\ell$ estimation and cross-spectrum analysis.
- A modular Python CLI (
comet) for configuration, running, and summarizing results. - Continuous integration with scientific software dependencies pinned for reproducibility.
The CMB provides a snapshot of the universe at
-
CMB lensing: Deflections of CMB photons by intervening large-scale structure. This remaps CMB anisotropies and encodes information about the matter distribution at
$z \sim 2$ . - Cross-correlations: Combining CMB lensing with galaxy surveys or internal Planck products constrains cosmological parameters and tests $\Lambda$CDM.
- Pseudo-$C_\ell$ techniques: Estimation of angular power spectra in the presence of masks, implemented here via NaMaster.
For background, see the references in the docs directory:
- Planck Collaboration (2018): Planck 2018 results. VIII. Gravitational lensing
- Alonso et al. (2019): NaMaster: Master of the Mask
- Other project-specific notes in
docs/*.pdf.
-
Config-driven runs: input data and pipeline steps specified via YAML (
config/prereg.yaml,config/paths.example.yaml). - Automated data checks: verifies presence and integrity of large Planck maps before processing.
-
Stable CLI interface:
produces a JSON summary of run metadata and results.
./bin/comet-run
-
Local + CI reproducibility: identical environments with
micromamba, verified via./bin/ci. -
Extensible analysis: current pipeline stubs compute metadata; next stage integrates NaMaster for
$C_\ell$ estimation.
micromamba create -f environment.yml
micromamba run -n comet pip install -e ".[dev]"./bin/ciDownload the Planck SMICA temperature and lensing convergence maps and place
them in the repository's data/ directory:
project-comet/
└── data/
├── COM_CompMap_CMB-smica_2048_R1.20.fits
└── COM_CompMap_Lensing_2048_R1.10.fits
If you keep the maps somewhere else, point the pipeline at that directory by
setting COMET_DATA_DIR before running the CLI:
export COMET_DATA_DIR=/path/to/planck/mapsYou can confirm that the data are discoverable with the helper command:
micromamba run -n comet python -m comet.cli data --listRun the default analysis (the helper script now forwards any extra arguments to
the CLI, so you can tweak options such as --ordering if desired):
./bin/comet-runThe run writes its output to artifacts/summary.json. Inspect it with your
preferred JSON viewer (for example, jq):
jq . artifacts/summary.jsonThis will produce a JSON output like:
{
"ordering": "both",
"results": {
"nbins": 0,
"z": 0.0,
"notes": "stub"
}
}The quick stub above is useful for smoke tests. To reproduce the
science-grade null test and cross-spectrum that the collaboration uses
for publication, follow the staged steps below. All commands assume you
are inside the repository root, have activated the environment with
micromamba run -n comet, and have staged the Planck maps as described
earlier.
-
Confirm data discovery and record configuration hashes.
micromamba run -n comet python -m comet.cli data --list git status --short git rev-parse HEAD
Capture the git commit ID and any environment hashes in your run log.
-
Prepare a theory spectrum file. The repository does not ship a fiducial lensing theory. If you only have the two Planck maps staged above, you can build a self-consistent theory table by computing the full-sky auto and cross spectra from those maps. Run the provided helper script, which reads the FITS files, evaluates the temperature auto-spectrum
$C_\ell^{TT}$ , the lensing convergence auto-spectrum$C_\ell^{\kappa\kappa}$ , and their cross-spectrum$C_\ell^{T\kappa}$ withhealpy.anafast, then writes the four-column ASCII file expected by the CLI. Feel free to pass--lmax,--cmb-map, or--kappa-mapif your analysis setup differs.micromamba run -n comet python scripts/derive_theory_from_maps.py \ --output-npz data/theory/tk_planck2018.npz
By default the script saves
data/theory/tk_planck2018.txt. The CLI utilities accept this plain-text file directly: the first column must be the multipole$\ell$ , followed by$C_\ell^{TT}$ ,$C_\ell^{\kappa\kappa}$ , and$C_\ell^{T\kappa}$ . Supplying--output-npzalso writes the NumPy archive used by the tests and scripts in this repository. Afterwards, inspect the theory coverage to confirm it matches your analysis range:micromamba run -n comet python scripts/theory.py data/theory/tk_planck2018.npz \ --summary artifacts/theory_summary.json
-
Generate both commutator orderings at full resolution. Use the shared mask and preregistered binning (if available) when running the two orderings. Adjust
--quick-nside,--nlb,--lmin, and related arguments to your publication settings (the example below runs at NSIDE 2048 with 30-wide bins):micromamba run -n comet python scripts/run_order_A_to_B.py \ --data-dir "${COMET_DATA_DIR:-data}" \ --quick-nside 2048 --nlb 30 --lmin 30 --lmax 2048 \ --threshold-sigma 4.0 --apod-arcmin 60.0 \ --out artifacts/order_A_to_B_full.npz micromamba run -n comet python scripts/run_order_B_to_A.py \ --data-dir "${COMET_DATA_DIR:-data}" \ --quick-nside 2048 --nlb 30 --lmin 30 --lmax 2048 \ --threshold-sigma 4.0 --apod-arcmin 60.0 \ --out artifacts/order_B_to_A_full.npz
Each script emits a JSON sidecar summarizing the binning and mask choices. Archive both
.npzpayloads and their.jsoncompanions. -
Build the null covariance from simulations. Supply the same geometry choices (NSIDE, binning, mask) and the theory spectrum from step 2. Increase
--nsimsuntil the minimum eigenvalue is stable; for publication we typically use ≥1000 realizations.micromamba run -n comet python scripts/run_null_sims.py \ --data-dir "${COMET_DATA_DIR:-data}" \ --quick-nside 2048 --nlb 30 --lmax 2048 \ --theory data/theory/tk_planck2018.npz \ --nsims 1000 --seed 2025 \ --out-cov artifacts/cov_delta_full.npyInspect the terminal summary for the covariance size and record the random seed alongside the command in your lab notebook.
Reusing legacy binning: if you already have a long-running covariance generated with the pre-preregistration CLI defaults (for example, a 69×69 matrix from
--nlb 50), rerun both ordering scripts with--disable-preregand matching--nlb/--lmaxsettings so the Δ bandpowers align with that covariance. -
Form the commutator residual and null statistic.
micromamba run -n comet python scripts/compute_commutator.py \ --order-a artifacts/order_A_to_B_full.npz \ --order-b artifacts/order_B_to_A_full.npz \ --cov artifacts/cov_delta_full.npy \ --out-delta artifacts/delta_ell_full.npy \ --out-summary artifacts/summary_full.json
The resulting JSON contains the Δ vector length and the stabilized χ ("z") statistic for the null test. When reusing an older covariance that has one or two extra high-ℓ bins, add
--trim-covarianceto drop those trailing rows/columns so the matrix matches the Δ bandpowers. -
Assemble the science cross-spectrum. Average the two orderings, compare to theory, and compute per-bin significances:
micromamba run -n comet python scripts/compute_cross_spectrum.py \ --order-a artifacts/order_A_to_B_full.npz \ --order-b artifacts/order_B_to_A_full.npz \ --theory data/theory/tk_planck2018.npz \ --lmin 30 \ --nlb 30 \ --cov artifacts/cov_delta_full.npy \ --out artifacts/cross_tk_full.npz \ --summary artifacts/cross_summary_full.json
The explicit
--lmin/--nlbvalues ensure the theory is binned with the same geometry as the commutator runs when the preregistration metadata diverges. Checkartifacts/cross_summary_full.jsonto confirm the mean and maximum |z| are consistent with a null detection.The ordering scripts now record the binning limits alongside the spectra, so the cross-spectrum CLI can infer
lmin/nlbdirectly from those artifacts. Passing the explicit values remains recommended for a reproducible command log, especially when exchanging files generated by older commits that predate the embedded metadata. -
Generate publication figures and a textual digest.
micromamba run -n comet python scripts/summarize_results.py \ --delta artifacts/delta_ell_full.npy \ --summary docs/summaries/full_run.json \ --cov artifacts/cov_delta_full.npy \ --cross artifacts/cross_tk_full.npz \ --outdir docs/figures/full_run
This produces plots of Δ bandpowers, the null histogram, the T×κ spectrum with uncertainties, and per-bin z-scores. Include these figures and the JSON summaries in your archival package.
-
Archive provenance. Save the executed command list, git commit hash, configuration files (
config/*.yaml), the artifacts underartifacts/, and generated figures underdocs/figures/full_run/in a versioned, timestamped directory for future audits and publication supplements.
Following these steps yields a repeatable end-to-end analysis: staging data, constructing both commutator orderings, calibrating the covariance from simulations, computing the null statistic, and delivering the science cross-spectrum together with diagnostic plots ready for publication.
project-comet/
├── bin/ # CLI wrappers (comet-run, ci)
├── config/ # Example preregistration + paths configs
├── data/ # Large Planck FITS maps (ignored by Git)
├── docs/ # Scientific documentation, PDFs, figures
├── src/comet/ # Python package (cli, run, io_maps, etc.)
├── tests/ # Unit and smoke tests
├── artifacts/ # Generated outputs (ignored by Git)
├── environment.yml # Micromamba environment definition
├── Makefile # Common commands (make ci, make run, make lint)
└── README.md # You are here
- Planck 2018 lensing paper (A&A, 641, A8)
- NaMaster: Master of the Mask
- Additional project design notes and figures: see
docs/.
