Skip to content

A java.lang.RuntimeException error occured #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EsqYu opened this issue Dec 20, 2024 · 13 comments
Closed

A java.lang.RuntimeException error occured #40

EsqYu opened this issue Dec 20, 2024 · 13 comments

Comments

@EsqYu
Copy link

EsqYu commented Dec 20, 2024

When executing PC algorithm for mixed data, the following error occurs. How can I fix it?

File "AbstractBootstrapAlgorithm.java", line71, in edu.cmu.tetrad.algcomparison.algorithm.AbstractBootstrapAlgorithm.search

Exception: Java Exception

The above exception was the direct cause of the following exception:

self.java=alg.search(self.data, self.params)
java.lang.java.lang.RuntimeException: java.lang.RuntimeException: Undefined likelihood encountered for test: variable1 || variable2

@jdramsey
Copy link
Collaborator

Oh gosh, I didn't see this issue! Are you still having trouble?

@EsqYu
Copy link
Author

EsqYu commented Feb 28, 2025

Yes. Instead of using PC Algorithm, now I'm using FGES. However, the problem still occurs and I'm not sure what is the cause of it.

@jdramsey
Copy link
Collaborator

Generally when you see undefined likelihoods, it's because of a singularity in the data, or because an undefined value is getting into the covariance matrix. Let me try playing with that. There are four things you can do if there's a singularity:

  1. Find the cause of the singularity and remove it. This could be a constant column, or if there is a column defined in terms of other columns without any noise, remove one of the columns. If it's not obvious in the data which columns are causing the problem, this can be not easy. But here, the PC implementation tells you which variables are problematic: variables1 and variable2. Is one of those constant or defined in terms of the other?

  2. Check to see if there are any undefined values in the data. The algorithm is supposed to be doing testwise deletion on these, but perhaps there's an issue there. If a column is all or mostly missing, that could be a problem, especially if you have a mixture of discrete and continuous variables.

  3. You could use pseudoinverses. I had just removed the code in the tests using that, but could put it back. However, that won't be effective if you have NaNs in your data. Also, this won't work if you use testwise deletion to handle the missing values but have too many missing values.

  4. You could use regularization. I just added this to the most recent version of Tetrad in the repository; I could publish this so you can use it. Again, this won't work if there are missing values or you use testwise deletion and there are too many missing values.

Again, I'm sorry for missing this issue; I usually get a notification in Slack, but I didn't get one this time for some reason. It may help if I could look at your data, though I've come to realize that's not always an option and I need to settle for just asking questions about tit.

@jdramsey
Copy link
Collaborator

I don't suppose I can just try it for myself on your data? Also have you looked at variable1 and varaible2 to see if you can spot an issue?

@jdramsey
Copy link
Collaborator

Let me know if you'd like to try regularization in py-tetrad. I believe the relevant version of tetrad is already in py-tetrad and all you need to do is pip uninstall py-tetrad and pip install it again. Then you need to change the parameters to your method call to add a 'lambda' parameter and set it to something small like 1e-6.

@jdramsey
Copy link
Collaborator

jdramsey commented Feb 28, 2025

Actually, I can do you one better. I replaced using pseudoinverses by regularization in the tests and scores, but I should let the user decide which one to use. In the code it's easy, but I was thinking of a good way to do that in the interface. I have an idea. I'm not sure, but I think I can overload the value of the regularization lambda parameter. If you set lambda to 0, it does not regularize and does a standard matrix inversion for the tests and scores. Setting it to a small positive number like 1e-6 or 1e-10 will do the regularization. But I can set it up so that if you set lambda to -1, it will do a pseudoinverse. I'll try this and update the py-tetrad if it works. I can finish it later this afternoon; you will have two methods to try.

It will take me a little while because I also need to update all of the documentation.

Here's the issue in a nutshell: maybe you understand all of this, but if someone else is reading it, perhaps a little explanation will help. When calculating a test or score result, usually it is necessary to invert a matrix. If you have a problem with singularity in the data, which I suspect you do if you don't see any missing values, then you may be asking Tetrad to invert matrices where singularities occur, in which case it will throw singularity exceptions and refuse to do as you ask. Two solutions to this problem suggest themselves. The first is to use pseudoinverses, or generalized inverses (i.e., Moore-Penrose inverses). So long as you don't have missing values, the pseudoinverse of a matrix is always defined, even if you have singularities in the data, and gives you just the regular inverse of a (square) matrix in case you don't. The other is regularization. The idea of regularization is that you add a small number to all diagonal entries in the matrix you want to invert, which guarantees (again, so long as you don't have missing entries in the covariance matrix) that the inversion will work. There isn't a settled answer as to which is always better; it depends on your data and what you're trying to accomplish.

My suggestion, which I will implement today, is that you should be allowed to do either and try them out. As long as you don't have missing values, either should solve the singularity problem, and it's just an interface issue to allow you to choose.

@jdramsey
Copy link
Collaborator

OK I've done it; you can try it out. Here's what you need to do in py-tetrad. (I assume you've been using py-tetrad since you posted it on this Issues list.

  • pip uninstall py-tetrad and pip install git+https://github.com/cmu-phil/py-tetrad.
  • When calling the use_fisher_z(.) command in TetradSearch, add a singularity_lambda parameter, thus: ``use_fisher_z(singularity_lambda=-1) or use_fisher_z(singularity_lambda=1e-6)
    • Setting singularity_lambda to -1 tells Tetrad to take the pseudoinverse of every matrix to avoid singularity exceptions.
    • Setting singularity_lambda to 1e-6 (or some similarly small number)tells Tetrad to regularize every covariance matrix by adding this lambda to each diagonal element in the matrix.
  • Similarly for use_sem_bic(.).
  • Run your searches as usual:
    • e.g., call run_pc(.) or run_fges(.) or run_boss(.)
    • Call get_java() to get the Tetrad graph.

That should be it. Let me know how it works for you, whether this solves your problem. In any case, I don't mind having taken the time to do it; it's an outstanding issue.

By the way, if anyone reading this is using the Java Tetrad interface, these parameters have also been added there.

@jdramsey
Copy link
Collaborator

Oh hold on, I did not actually fix it for your case. You were doing mixed data. I'll look at that later this afternoon. I have several meetings starting now.

@jdramsey
Copy link
Collaborator

OK, I've fixed all of the mixed scores and tests. Give them a shot.

I found a bug in Conditional Gaussian BIC, which I've fixed. Perhaps that was an issue you were having trouble with that. I got an exception there; perhaps you had gotten that exception.

Conditional Gaussian doesn't do any matrix inversion, so there is no lambda parameter for it, but for the test (including the Degenerate Gaussian score/test) there is a singularity_lambda parameter you can set. Again, if you set it to -1, you get pseudoinverses; if you set it to zero, it is a normal inverse, and if you set it to a small positive number, you get regularization.

Let me know if this solves your problem. If you uninstall and reinstall py-tetrad using pip, it should be in py-tetrad.

@EsqYu
Copy link
Author

EsqYu commented Mar 4, 2025

Thank you so much for your fixation.
I tried regularization, where I set singularity_lambda to 1e-6, but it didn't work and posed the following error.
Traceback (most recent call last):
search.use_degenerate_gaussian_test(singularity_lambda=1e-6)
File "/home/usr/.local/lib/python3.8/site-packages/pytetrad/tools/TetradSearch.py", line 239, in use_degenerate_gaussian_test
self.TEST = ind_.DegenerateGaussianLRT()
AttributeError: Java package 'edu.cmu.tetrad.algcomparison.independence' has no attribute 'DegenerateGaussianLRT'

@jdramsey
Copy link
Collaborator

jdramsey commented Mar 4, 2025

Oh I'll bet it's Lrt instead of LRT. I'll fix that.

@EsqYu
Copy link
Author

EsqYu commented Mar 5, 2025

Thank You! I'll wait for it.

@jdramsey
Copy link
Collaborator

jdramsey commented Mar 7, 2025

Oh sorry I forgot to tell you, I fixed that typo. You'll get the fix if you uninstall and re-install py-tetrad (pip).

Joe

@EsqYu EsqYu closed this as completed Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants