-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query list samples command #58
Conversation
I've been wondering what our testing strategy should be, and this is one reason to add unit tests that call functions directly (i.e. not via the CLI). In this case it should just be a matter of adding a unit test for I think we need validation tests too of course, but these are less focused on exercising test coverage and more about checking that we match bcftools across a wide range of CLI parameters. So in this case I think it's fine to keep the tests you've already added as well.
I think a lot of the missing VCF writer coverage is because we are not exercising VCF header generation yet, but that's a bigger topic - see #47 and related issues. |
66a0b71
to
ac397a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Probably best to wait for #62 before we merge this.
tests/test_query.py
Outdated
list_samples(vcz_path, output=tmp_path / "sample_ids.txt") | ||
|
||
with open(tmp_path / "sample_ids.txt") as file: | ||
assert file.read() == expected_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #62 we can avoid testing every command twice (once with StringIO
, and once with a file path). So I think it's OK to remove the file path test here.
import zarr | ||
|
||
|
||
def list_samples(vcz_path, output=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can output be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print
will default to sys.stdout
if the file
argument is None
. I could make the output
default sys.stdout
to make this more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, making the output
default to sys.stdout
breaks the tests. I think because Python evaluates sys.stdout
when it reads the function definition, but then stdout is changed during the validation tests.
Anyway, I think it is okay because print
's file
argument can be None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. I noticed it because view
explicitly specifies sys.stdout
when it calls write_vcf
. Not a big deal, but it may be worth making them consistent (although that might fall out of #54 anyway).
ac397a9
to
612602e
Compare
Overview
This pull request implements the
bcftools query --list-samples
command from bcftools in vcztools. The vcztools command loads the sample IDs from the VCF Zarr group and prints them.This pull request closes #51.
Example usage
Testing
I added some tests that compare bcftools' output with vcztools' output along with some unit tests. The code introduced in this pull request has good coverage.
The coverage tool shows no coverage on the code added in this PR because the tests run the code in a different process. Most of the VCF writer code is also not covered because of this testing approach.References