Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate the sign (phase) of the mSSA PCs #110

Merged
merged 5 commits into from
Feb 14, 2025
Merged

Conversation

The9Cat
Copy link
Contributor

@The9Cat The9Cat commented Feb 13, 2025

Problem

Signs of SVD left and right singular values have ambiguous signs. Subsequent SVD invocations in expMSSA can result in PCs with different signs (i.e. pi radians out of phase).

Fix

Choose signs of the vectors in the eigen triple based on their squared-norm-weight projection on the original data matrix (either the trajectory or covariance matrix, in our case).

Comments

The algorithm is implemented in a helper function called SvdSignChoice. This routine has been checked by decomposing a run of random matrices and confirming that the algorithm reconstructs the input data to machine precision after the sign choice is applied.

There is one big down side: this literal algorithm is slow since it's passing through the entire trajectory matrix. I have tried to speed this up by striding the sampling the trajectory so that the number of samples equals the rank rather than a full computation. But the resulting inefficiency was large, so it wasn't helpful.

The addition to expMSSA is then trivial. I have included a config flag, Sign to turn this algorithm on and off, if need be. It is now on by default. Code has been tested on the pyEXP-examples/Tutorials/mSSA notebooks.

Copy link
Member

@michael-petersen michael-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes! I've also put these changes through a standard set of tests, including running on a 'real research' project and produced all expected outputs.

Users might notice the time hit by default if they are running production values; but we can work with this in documentation. To provide some context: on a research problem, I saw approximately a doubling in time with Sign : true vs Sign : false; of course this is problem dependent but that's probably roughly true in general.

The only reason I can see for not merging this immediately is if we think there will be any more algorithmic changes?

@The9Cat
Copy link
Contributor Author

The9Cat commented Feb 14, 2025

The only reason I can see for not merging this immediately is if we think there will be any more algorithmic changes?

Agreed.

I tried subsampling to reduce the rank, but doing the strided computation row-by-column rather than the full matrix multiply was dreadfully slow. I think it breaks the optimization heuristics built into Eigen3.

However, there are ways of speeding it up. For example, making two new data arrays with every nth row and every nth column and then doing the matrix multiplies should get around the matrix arithmetic bottleneck that is slowing it down. Not sure whether that's a good or bad idea, however. It would still be 'deterministic' so I wouldn't be terribly worried. But I'm not sure how to best test that either.

@The9Cat
Copy link
Contributor Author

The9Cat commented Feb 14, 2025

I am not planning any more work on this in the near future. Unless we have an entirely new idea for a heuristic sign-choice algorithm.

@michael-petersen michael-petersen merged commit dccf0b8 into main Feb 14, 2025
8 checks passed
@michael-petersen michael-petersen deleted the UniquePCsigns branch February 14, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants