Add query list samples command #58

Will-Tyler · 2024-08-17T01:42:37Z

Overview

This pull request implements the bcftools query --list-samples command from bcftools in vcztools. The vcztools command loads the sample IDs from the VCF Zarr group and prints them.

This pull request closes #51.

Example usage

vcztools query -l vcz_test_cache/sample.vcf.vcz
NA00001
NA00002
NA00003
vcztools query --list-samples vcz_test_cache/sample.vcf.vcz
NA00001
NA00002
NA00003

Testing

I added some tests that compare bcftools' output with vcztools' output along with some unit tests. The code introduced in this pull request has good coverage.

The coverage tool shows no coverage on the code added in this PR because the tests run the code in a different process. Most of the VCF writer code is also not covered because of this testing approach.

References

bcftools query

tomwhite · 2024-08-19T11:21:17Z

The coverage tool shows no coverage on the code added in this PR because the tests run the code in a different process.

I've been wondering what our testing strategy should be, and this is one reason to add unit tests that call functions directly (i.e. not via the CLI). In this case it should just be a matter of adding a unit test for list_samples.

I think we need validation tests too of course, but these are less focused on exercising test coverage and more about checking that we match bcftools across a wide range of CLI parameters. So in this case I think it's fine to keep the tests you've already added as well.

Most of the VCF writer code is also not covered because of this testing approach.

I think a lot of the missing VCF writer coverage is because we are not exercising VCF header generation yet, but that's a bigger topic - see #47 and related issues.

tomwhite

LGTM. Probably best to wait for #62 before we merge this.

tomwhite · 2024-08-20T09:58:42Z

tests/test_query.py

+    list_samples(vcz_path, output=tmp_path / "sample_ids.txt")
+
+    with open(tmp_path / "sample_ids.txt") as file:
+        assert file.read() == expected_output


With #62 we can avoid testing every command twice (once with StringIO, and once with a file path). So I think it's OK to remove the file path test here.

tomwhite · 2024-08-20T09:59:34Z

vcztools/query.py

+import zarr
+
+
+def list_samples(vcz_path, output=None):


Can output be None?

print will default to sys.stdout if the file argument is None. I could make the output default sys.stdout to make this more clear.

Actually, making the output default to sys.stdout breaks the tests. I think because Python evaluates sys.stdout when it reads the function definition, but then stdout is changed during the validation tests.

Anyway, I think it is okay because print's file argument can be None.

Ah OK. I noticed it because view explicitly specifies sys.stdout when it calls write_vcf. Not a big deal, but it may be worth making them consistent (although that might fall out of #54 anyway).

tomwhite mentioned this pull request Aug 19, 2024

Add support for index -n/--nrecords #59

Merged

Will-Tyler force-pushed the list-samples branch 2 times, most recently from 66a0b71 to ac397a9 Compare August 19, 2024 19:18

tomwhite mentioned this pull request Aug 20, 2024

Factor out open_file_like utility context manager #62

Merged

tomwhite approved these changes Aug 20, 2024

View reviewed changes

Add query list samples command

612602e

Will-Tyler force-pushed the list-samples branch from ac397a9 to 612602e Compare August 20, 2024 16:33

Will-Tyler requested a review from tomwhite August 20, 2024 16:40

tomwhite approved these changes Aug 21, 2024

View reviewed changes

tomwhite merged commit eb04b7f into sgkit-dev:main Aug 21, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add query list samples command #58

Add query list samples command #58

Will-Tyler commented Aug 17, 2024 •

edited

Loading

tomwhite commented Aug 19, 2024

tomwhite left a comment

tomwhite Aug 20, 2024

tomwhite Aug 20, 2024

Will-Tyler Aug 20, 2024

Will-Tyler Aug 20, 2024

tomwhite Aug 20, 2024

Add query list samples command #58

Add query list samples command #58

Conversation

Will-Tyler commented Aug 17, 2024 • edited Loading

Overview

Example usage

Testing

References

tomwhite commented Aug 19, 2024

tomwhite left a comment

Choose a reason for hiding this comment

tomwhite Aug 20, 2024

Choose a reason for hiding this comment

tomwhite Aug 20, 2024

Choose a reason for hiding this comment

Will-Tyler Aug 20, 2024

Choose a reason for hiding this comment

Will-Tyler Aug 20, 2024

Choose a reason for hiding this comment

tomwhite Aug 20, 2024

Choose a reason for hiding this comment

Will-Tyler commented Aug 17, 2024 •

edited

Loading