Out-of-Sample Evaluation of Portfolio Optimization Methods Under Estimation Uncertainty

1. Business Problem

Asset managers and quantitative analysts face a consistent challenge: selecting an allocation strategy that is robust out-of-sample, not just historically attractive. Mean-variance optimisation has been the industry standard since the work of Harry Markowitz (1952), yet in practice it is well-documented to be highly sensitive to estimation error — small changes in expected return inputs produce dramatically different, often extreme portfolios.

This project addresses two operational questions directly relevant to portfolio management:

Which allocation strategy delivers the best risk-adjusted return when deployed forward in time — not fitted to history?
How accurately does each strategy predict its own performance — and which strategies should be trusted when their assumptions are violated by regime change?

The predicted vs actual framework makes estimation error visible and measurable, rather than burying it in aggregate backtest statistics. This is directly applicable to strategy selection, risk budgeting, and model validation workflows at asset management firms.

2. Solution Overview & Technology Stack

Strategies

Because return estimates are noisy, this project focused on risk-based and regularized portfolio optimization and all methods were evaluated using rolling out-of-sample backtests.

Strategy	Estimation inputs	Purpose	Strength	Weakness
Equal Weight	None — 1/N baseline	Baseline	Very robust baseline	Ignores risk structure
Minimum Variance	Covariance only	Risk control	Good downside protection	Sensitive to covariance estimation
Risk Parity	Covariance only	Robust allocation	Diversified risk exposure	Requires stable covariance
Regularised Max Sharpe	Mean + Covariance, L2 penalty	Active tilt	Strong theoretical foundation	Unstable with noisy returns

Metrics

Each strategy is evaluated across return, risk, and efficiency dimensions:

Return Quality	Risk
Annualised Return (CAGR)	Annualised Volatility
Sharpe Ratio	Maximum Drawdown
Sortino Ratio	CVaR (95%)
Calmar Ratio	—

Turnover is tracked separately per rebalance — higher turnover means higher transaction costs in production.

To mitigate estimation error in portfolio inputs, the framework optionally applies shrinkage to both expected returns and the covariance matrix. This allows to explore how different levels of regularization affect portfolio stability and out-of-sample performance.

Additionally, each rebalance records predicted vs realised Sharpe, Sortino, and Volatility — measuring how accurately each strategy anticipated its own risk-adjusted performance.

Methodology

The framework employs a rolling walk-forward backtesting procedure: portfolio parameters are estimated on a fixed-length historical window, optimized weights are applied to the next rebalancing period, and the window is then advanced to simulate sequential real-time portfolio management.

flowchart LR
    A["Select rolling training window<br>(1–10 years)"] --> B["Estimate expected returns<br>and covariance matrix"]
    B --> C["Compute portfolio weights<br>for each strategy"]
    C --> D["Apply weights to<br>next rebalance period"]
    D --> E["Record realized returns<br>and risk metrics"]
    E --> F{Next rebalance<br>date?}
    F -->|Yes| G["Slide training window forward"]
    G --> B
    F -->|No| H["Aggregate results<br>and compare strategies"]

Training — estimate expected returns and the covariance matrix from the rolling window.

Weights — each strategy computes portfolio allocations independently using the same inputs.

Test — the weights are held fixed until the next rebalance. No adjustments, no look-ahead.

Window Update — the training window slides forward by one month, and the process repeats through the entire out-of-sample period.

Data

Daily adjusted price data is downloaded from Yahoo Finance using the Python library yfinance.

The pipeline performs:

price alignment across assets
return calculation
missing value handling

Reproducibility

All results in this project are fully reproducible. The workflow follows a deterministic pipeline:

flowchart LR
    A[Download Prices\<br>yfinance] --> B[Compute Returns\<br>Align Series]
    B --> C[Walk-Forward\<br>Backtest]
    C --> D[Evaluation\<br>Metrics]
    D --> E[Streamlit\<br>Dashboard]

The repository includes:

Automated tests pytest
Continuous integration - GitHub Actions
Deterministic backtesting - Fixed pipeline

This ensures the results can be independently verified and extended.

Stack

Python 3.11 · scipy · scikit-learn · pandas · numpy · yfinance · Streamlit · Plotly · pytest · GitHub Actions

Project structure

└── 📁portfolio_optimization
    └── 📁.github
        └── 📁workflows
            ├── ci.yaml
    └── 📁.streamlit
        ├── config.toml
    └── 📁src
        └── 📁portfolio_optimization
            ├── __init__.py
            ├── backtest.py
            ├── config.py
            ├── data.py
            ├── main.py
            ├── metrics.py
            ├── optimization.py
    └── 📁tests
        ├── __init__.py
        ├── test_backtest.py
        ├── test_data.py
        ├── test_metrics.py
        ├── test_optimization.py
    ├── .gitignore
    ├── app.py
    ├── pyproject.toml
    ├── README.md
    └── requirements.txt

Quickstart

git clone https://github.com/marieltv/portfolio_optimization.git
cd portfolio_optimization
pip install -e ".[dev]"
streamlit run app.py

Live demo →

Tests

pytest -v

Empirical Results

Evaluated on US defence equities — Boeing (BA), Northrop Grumman (NOC), Lockheed Martin (LMT), RTX Corporation (RTX), Axon Enterprise (AXON), and General Dynamics (GD) — over 2018–2026 using a 4-year rolling training window with monthly rebalancing.

Strategy	CAGR	Sharpe	Sortino	Max Drawdown
Equal Weight	20.3%	1.05	1.08	-17.5%
Min Variance	19.8%	1.05	1.10	-16.8%
Risk Parity	20.0%	1.07	1.11	-15.8%
Reg Max Sharpe	24.9%	1.16	1.23	-17.4%

Regularised Max Sharpe achieves the strongest performance across most metrics. Although the strategy is theoretically the most sensitive to estimation error, the L2 penalty appears sufficient to prevent extreme weight concentration in this asset universe.

Estimation accuracy.
Volatility forecasts exhibit relatively low error (MAE ≈ 0.07) across strategies, confirming that covariance estimates are substantially more stable than expected return estimates. However, the Spearman correlations between predicted and realised Sharpe and Sortino ratios are close to zero. This indicates that month-to-month risk-adjusted performance cannot be reliably inferred from historical estimates alone — consistent with findings such as those of Robert C. Merton.

Applying shrinkage techniques such as Ledoit–Wolf shrinkage and James–Stein estimator resulted in only marginal improvements (≈0.01 or less across metrics), suggesting that estimation error is not the dominant source of out-of-sample degradation in this particular universe.

Predicted Sharpe and Sortino values remain relatively flat across time. This behaviour arises from estimating expected returns over a multi-year rolling window. Rather than indicating model failure, it reflects a fundamental empirical property of equity markets: covariance is moderately forecastable, whereas mean returns are not. In practice, these allocation methods therefore provide the greatest value through risk budgeting, diversification, and drawdown control, rather than through short-horizon return prediction.

Interactive Dashboard

An interactive dashboard built with Streamlit allows exploration of the backtest results and strategy behaviour.

The dashboard enables users to:

select the training window length used for walk-forward optimisation
compare portfolio strategies across multiple performance metrics
analyse cumulative returns and drawdowns
inspect predicted vs realised Sharpe, Sortino, and volatility
control mean return shrinkage and covariance matrix shrinkage
evaluate portfolio turnover and allocation dynamics

Interactive charts are rendered using Plotly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Out-of-Sample Evaluation of Portfolio Optimization Methods Under Estimation Uncertainty

1. Business Problem

2. Solution Overview & Technology Stack

Strategies

Metrics

Methodology

Data

Reproducibility

Stack

Project structure

Quickstart

Tests

Empirical Results

Interactive Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
src/portfolio_optimization		src/portfolio_optimization
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Out-of-Sample Evaluation of Portfolio Optimization Methods Under Estimation Uncertainty

1. Business Problem

2. Solution Overview & Technology Stack

Strategies

Metrics

Methodology

Data

Reproducibility

Stack

Project structure

Quickstart

Tests

Empirical Results

Interactive Dashboard

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages