Python data-cleaning pipeline code for CTA research (feature prep, cross-sectional standardization, and leak-safe preprocessing), plus a few small helpers used by that pipeline.
get_corr_new.py: fast daily cross-sectional correlation (rank + Pearson)functions.py: utilities (e.g., rolling window node generation)strategy_backtest_metrics.py: lightweight backtest metrics/plots
Large data/artifacts are intentionally excluded from git via .gitignore (e.g., *.pq, dd_pre/, dd_3_por/).