-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix reading VCF with no header #5206
Conversation
Instead of starting with the first line that contains number - slash - number, the parsing should simply start at the line after the line starting with #CHROM ? |
The error is when reading in the transposed VCF so there is no #CHROM line. It might be that the problem is with the creation of the transposed file not reading the file. |
In the code it actually skips 8 entries, which is correct for the untransposed file, but probably incorrect for the transposed file? |
I have it working now. The problem was that the parse_with_plugin() function was reading a fixed number of comment lines which is never true. I changed this to read all lines starting with "##", the comment lines. Then the next_genotype() function then skips the crhom, pos, id, ref, alt correctly |
When I try to upload a test file, I see the following error in the log: and the following error on the user interface: The organism species you provided is not in the database! Please contact us. NOTE: It worked after I made a genotyping project... So the genotyping project cannot be made on the fly. |
I haven't seen that problem but for me the web VCF upload is still renaming my stock names and causing the upload to fail. The $include_lab_numbers setting is enabled when you choose "accessions" and that removes "." from stock names. I'll have to remove or fix that code. |
@lukasmueller I was getting the same error until I made a new Genotyping Protocol instead of using the one already available on the fixture. It looks like the Protocol available on the fixture (GBS ApeKI genotyping v4) could be giving that error because its information format is out of date (see below). |
We should update the fixture to have the correct format for the protocol... could that be done for in the context of this PR? |
From the cli: Loading works, if there is at least one header line in the VCF file. If no headers, then the data about the protocol and markers is loaded, but it does not seem the alleles data is loaded. If you expand the 'Genotype Data' section, it hangs up processing the retrieval of the data. No obvious error is thrown. |
If you are using the cassava_test.vcf, it may be the file is causing the loading problems. The last line is truncated. I cleaned up the file to make it valid. |
When the VCF has no header then the first few accessions are skipped while loading
This modification works on the transposed file
#5203
Checklist
/t/data/fixture/patches/
./docs
has been updated./js
to/js/source/legacy
.