-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcfR2tidy can't handle files with empty INFO #200
Comments
Hi @alkaZeltser , thanks for posting this and thank you for creating a minimal reproducible example. That greatly facilitates this conversation! I have reproduced your example and agree that the behavior you report is accurate. I have some questions that I hope you can help to address. You report VCF data where the INFO column is populated by NA (in R we use NA, the VCF specification uses '.'). Because this column has no data it is not documented in the meta region. If there were any data in the 'INFO' column I do feel that it should be described in the meta region and if it were not I would interpret this as deviating from the VCF specification, and I would not feel obligated to address it. But you're reporting no records in the meta AND missing data in the INFO column. Because I have never encountered your situation I took a look at the VCF specification to make sure you're reporting a reasonable situation. And I do believe it's reasonable. So I'm curious, how was this file created? The issue appears to be that we have not anticipated a situation where there are zero INFO records in the meta region. Thank you for proposing a solution! If we move forward on this it seems to me we would be better off engineering a solution where it is recognized that there are no INFO records in the meta region as opposed to adding dummy data. I think I can handle this. But I thought I should address your proposal and see if you agree. Thanks! |
Hi Brian, thank you for your reply!
It is indeed an unusual file from my experience; very minimalist.
I completely agree, my solution was simply a quick hack/sanity check. Accounting for a case with an empty INFO column and no INFO records in the VCF header would be more appropriate. |
Hi @alkaZeltser , I'm not familiar with I think I have a solution for your issue. It's in the master branch now. If you'd like to give it a try, the following should install it.
Note that you will need a compiler to install from GitHub. If you do try it and notice anything, please let me know and we can try again. If you're interested in changes I made, you should be able to see the diff below. Thanks and please let me know if you do not feel this addresses your issue! |
Thank you for responding so quickly! I will try to test out the new code. |
Firstly, thank you for a wonderful package!
I, unfortunately, have discovered a bug with an interesting edge-case.
I have a VCF that contains no information in the INFO column (
.
in each row) and accordingly no lines in the header with information on any INFO field (no##INFO=<>
lines).This becomes a problem when calling the
vcfR2tidy
function. It generates the following error:Error in strsplit(unlist(x), split = "=") : non-character argument
This error is resolved when a dummy INFO line is added to the
@meta
object.I have traced the source to the following function which parses the INFO and FORMAT header information, assuming that a line starting with ##INFO exists:
vcfR/R/vcfR_to_tidy_functions.R
Line 632 in 9ef985b
Reproducible example:
The text was updated successfully, but these errors were encountered: