fix reading VCF with no header #5206

ClayBirkett · 2024-11-14T21:52:31Z

When the VCF has no header then the first few accessions are skipped while loading
This modification works on the transposed file

#5203

Checklist

lukasmueller · 2024-11-19T14:28:46Z

Instead of starting with the first line that contains number - slash - number, the parsing should simply start at the line after the line starting with #CHROM ?

ClayBirkett · 2024-11-19T17:32:18Z

The error is when reading in the transposed VCF so there is no #CHROM line. It might be that the problem is with the creation of the transposed file not reading the file.

lukasmueller · 2024-11-20T14:17:44Z

In the code it actually skips 8 entries, which is correct for the untransposed file, but probably incorrect for the transposed file?

ClayBirkett · 2024-11-20T16:03:11Z

I have it working now. The problem was that the parse_with_plugin() function was reading a fixed number of comment lines which is never true. I changed this to read all lines starting with "##", the comment lines. Then the next_genotype() function then skips the crhom, pos, id, ref, alt correctly

lukasmueller · 2024-11-29T06:03:52Z

When I try to upload a test file, I see the following error in the log:
[error] Attribute (tempfile) does not pass the type constraint because: Validation failed for 'Str' with value undef at /home/production/cxgn/local-lib/lib/perl5/x86_64-linux-gnu-thread-multi/Moose/Object.pm line 24
Moose::Object::new('CXGN::UploadFile', 'HASH(0x5adc3ba7e560)') called at /home/production/cxgn/sgn/bin/../lib/SGN/Controller/AJAX/GenotypesVCFUpload.pm line 380
SGN::Controller::AJAX::GenotypesVCFUpload::upload_genotype_verify_POST('SGN::Controller::AJAX::GenotypesVCFUpload=HASH(0x5adc350a9e10)', 'SGN=HASH(0x5adc3ac95260)') called at /home/production/cxgn/local-lib/lib/perl5/Catalyst/Action.pm line 358
Catalyst::Action::execute('Catalyst::Action=HASH(0x5adc354e9dd8)', 'SGN::Controller::AJAX::GenotypesVCFUpload=HASH(0x5adc350a9e10)', 'SGN=HASH(0x5adc3ac95260)') called at /home/production/cxgn/local-lib/lib/perl5/Catalyst.pm line 2060
eval {...} at /home/production/cxgn/local-lib/lib/perl5/Catalyst.pm line 2060
......

and the following error on the user interface:

The organism species you provided is not in the database! Please contact us.

NOTE: It worked after I made a genotyping project... So the genotyping project cannot be made on the fly.

ClayBirkett · 2024-11-30T13:12:54Z

I haven't seen that problem but for me the web VCF upload is still renaming my stock names and causing the upload to fail. The $include_lab_numbers setting is enabled when you choose "accessions" and that removes "." from stock names. I'll have to remove or fix that code.

alockrow · 2024-12-02T22:27:37Z

@lukasmueller I was getting the same error until I made a new Genotyping Protocol instead of using the one already available on the fixture. It looks like the Protocol available on the fixture (GBS ApeKI genotyping v4) could be giving that error because its information format is out of date (see below).

lukasmueller · 2024-12-04T01:24:53Z

We should update the fixture to have the correct format for the protocol... could that be done for in the context of this PR?

isaak · 2024-12-05T12:54:31Z

From the cli:

Loading works, if there is at least one header line in the VCF file.

If no headers, then the data about the protocol and markers is loaded, but it does not seem the alleles data is loaded. If you expand the 'Genotype Data' section, it hangs up processing the retrieval of the data. No obvious error is thrown.

ClayBirkett · 2024-12-05T14:24:01Z

If you are using the cassava_test.vcf, it may be the file is causing the loading problems. The last line is truncated. I cleaned up the file to make it valid.

fix reading VCF with no header

37ab97a

ClayBirkett requested a review from lukasmueller November 15, 2024 15:05

lukasmueller requested a review from alockrow November 16, 2024 01:06

fix code that removes header

3c9c0ab

clean up test vcf

07380a6

fix test to work with clean VCF file

a383ec6

lukasmueller requested a review from isaak December 5, 2024 16:24

fix test to work with clean VCF file

2441cf9

lukasmueller approved these changes Dec 6, 2024

View reviewed changes

alockrow approved these changes Dec 6, 2024

View reviewed changes

lukasmueller merged commit 51b6389 into master Dec 7, 2024
4 checks passed

lukasmueller deleted the fix-vcf-upload branch December 7, 2024 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix reading VCF with no header #5206

fix reading VCF with no header #5206

ClayBirkett commented Nov 14, 2024 •

edited

Loading

lukasmueller commented Nov 19, 2024

ClayBirkett commented Nov 19, 2024

lukasmueller commented Nov 20, 2024

ClayBirkett commented Nov 20, 2024

lukasmueller commented Nov 29, 2024 •

edited

Loading

ClayBirkett commented Nov 30, 2024

alockrow commented Dec 2, 2024

lukasmueller commented Dec 4, 2024

isaak commented Dec 5, 2024

ClayBirkett commented Dec 5, 2024

fix reading VCF with no header #5206

fix reading VCF with no header #5206

Conversation

ClayBirkett commented Nov 14, 2024 • edited Loading

Checklist

lukasmueller commented Nov 19, 2024

ClayBirkett commented Nov 19, 2024

lukasmueller commented Nov 20, 2024

ClayBirkett commented Nov 20, 2024

lukasmueller commented Nov 29, 2024 • edited Loading

ClayBirkett commented Nov 30, 2024

alockrow commented Dec 2, 2024

lukasmueller commented Dec 4, 2024

isaak commented Dec 5, 2024

ClayBirkett commented Dec 5, 2024

ClayBirkett commented Nov 14, 2024 •

edited

Loading

lukasmueller commented Nov 29, 2024 •

edited

Loading