Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in nj(DM_copy) : missing values are not allowed in the distance matrix Consider using njs() #2

Open
anpefi opened this issue Dec 9, 2015 · 2 comments

Comments

@anpefi
Copy link
Owner

anpefi commented Dec 9, 2015

This is the issue #6262 in the R-Forge support tracker

Original comment in R-Forge:

Anonymous message posted by evaughn@email.arizona.edu

I had been running msap successfully with my dataset. I then reduced the number of loci in the dataset (after some data clean up) and now I am getting the following error:

Error in nj(DM_copy) :
missing values are not allowed in the distance matrix
Consider using njs()

I edited the source code to use njs() but I'm now left with an error that causes my PCoA plotting to fail:

Error in cmdscale(DM, k = length(inds) - 1, eig = T) :
NA values not allowed in 'd'

I am assuming that there are values in my distance matrix that don't exist possibly due to the reduction in the size of my data set? Do you have some advice that might help me remedy this problem? I've attached the data file I am running.

Thanks!

@anpefi anpefi added the bug label Dec 9, 2015
@anpefi anpefi self-assigned this Dec 9, 2015
@anpefi anpefi added this to the For release 1.1.10 milestone Dec 9, 2015
@anpefi
Copy link
Owner Author

anpefi commented Jul 5, 2016

This bug has been reported again by mail.
It happens when the dataset yields very low number of MSL with an high proportion of NAs. In that cases when the distance matrix is built it happens that some pairs of individuals cannot be compared as they are at least one NA across all loci compared, yielding a NA in the matrix.

By using njs() instead of nj() you can do the clustering because it is an algortithm designed for incomplete matrices. However, the PCoA cannot be done as, as far I know, there is not any algorithm allowing for working with missing data.

I need to think what is the best way to address this issue and then implement it, and it will take some time. Probably by using a different distance or any heuristic way to give uninformative states a distance. Suggestions are welcome about this.

An alternative workaround, if you could assume no (large) genetic differences between individuals across all the dataset then you could assume that 0/0 patterns are much more probable to be caused by hemimethylation of the target than by mutation causing a lack of the target and then consider them as methylated states (1) instead of missing (NA). In this case (no.bands="h") you can run the full analysis.

@anpefi
Copy link
Owner Author

anpefi commented Apr 5, 2017

I've been recalled that another workaroud that could work in some datasets is to reduce the probability of NA in distances by reducing the threshold to define a locus as MSL or NML when having discordant patterns (option: error.rate.primer=0). By default (error.rate.primer) is set to 0.05 (the typical error in AFLPs) but it could set to any other value, including 0. Then, those loci with very few discordant patterns would be considered as MSL, increasing this number. In some datasets, setting the threshold to 0 (this assumes that there is no error in the banding) works!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant