Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove min cov det #294

Closed
robertmartin8 opened this issue Feb 18, 2021 · 1 comment · Fixed by #299
Closed

Remove min cov det #294

robertmartin8 opened this issue Feb 18, 2021 · 1 comment · Fixed by #299
Labels
bug Something isn't working

Comments

@robertmartin8
Copy link
Owner

Describe the bug
The minimum covariance determinant method gives nonsensical covariance matrices.

df = get_data()
S = risk_models.min_cov_determinant(df, random_state=8)
np.diag(S)
array([2.41028948e-03, 2.03474589e-01, 0.00000000e+00, 0.00000000e+00,
       1.57413320e-01, 5.30639582e-02, 2.86881436e-01, 7.30126563e-02,
       7.33460893e-02, 0.00000000e+00, 6.48495304e-02, 6.14756936e-04,
       2.01800632e-02, 4.18976215e-02, 1.60823736e-01, 2.47871090e-01,
       5.52724202e-05, 7.78379082e-02, 9.82877705e-02, 1.31182474e-01])

Compare with sample covariance:

array([0.09321114, 0.20753691, 0.13720115, 0.10095819, 0.37539405,
       0.08310901, 0.39058571, 0.06909179, 0.18019232, 0.08011557,
       0.06555993, 0.24552417, 0.32116207, 0.05440377, 0.27222973,
       0.26954874, 0.12030304, 0.08550166, 0.14689262, 0.15258853])

min_cov_det is giving values several orders of magnitude below the other methods. There is an error in the code where we set NaNs to 0, but even after fixing this to dropna, the result is still far too low:

array([0.01921634, 0.02855301, 0.02299209, 0.05725663, 0.02720913,
       0.01819129, 0.16490903, 0.01775783, 0.03262536, 0.02917394,
       0.01434993, 0.06429909, 0.32185691, 0.01727326, 0.13953319,
       0.04986281, 0.01974825, 0.01409136, 0.01995168, 0.0199959 ])

I think the implementation of MCD is misguided – I only included it (many years ago) because I wanted as many risk models as possible, without really thinking too much about their applicability.

I will be deprecating it for v1.4.1, and removing in v1.5.

@robertmartin8 robertmartin8 added the bug Something isn't working label Feb 18, 2021
robertmartin8 added a commit that referenced this issue Feb 18, 2021
@phschiele
Copy link
Contributor

@robertmartin8 I tested min_cov_det on a few different samples, sometimes the covariances were in a similar range, sometimes a lot smaller. I also compared it against skleanrn's MinCovDet(random_state=8).fit(returns_df).covariance_ * 252, which again gave quite different results.

I am not very familiar with this method but at first glance, there seem to be some issues. I would also be in favor of deprecating it 👍

This was linked to pull requests Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants