Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests against numpy 2 #256

Closed
jeromekelleher opened this issue Jun 19, 2024 · 7 comments
Closed

Run tests against numpy 2 #256

jeromekelleher opened this issue Jun 19, 2024 · 7 comments

Comments

@jeromekelleher
Copy link
Contributor

Some errors:

tests/test_vcf_examples.py:846:                                                                                       
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
bio2zarr/vcf2zarr/vcz.py:1040: in convert                                                                             
    icf.explode(                                                                                                      
bio2zarr/vcf2zarr/icf.py:1188: in explode                                                                             
    writer.explode(worker_processes=worker_processes, show_progress=show_progress)                                    
bio2zarr/vcf2zarr/icf.py:1130: in explode
    pwm.submit(self.process_partition, j)                                                                             
bio2zarr/core.py:291: in submit                                                                                       
    future = self.executor.submit(*args, **kwargs)                                                                    
bio2zarr/core.py:90: in submit                                                                                        
    future.set_result(fn(*args, **kwargs))             
bio2zarr/vcf2zarr/icf.py:1071: in process_partition
    tcw.append("POS", variant.POS)                
bio2zarr/vcf2zarr/icf.py:806: in append
    self.field_writers[name].append(value)
bio2zarr/vcf2zarr/icf.py:724: in append            
    val = self.transformer.transform_and_update_bounds(val)                                                            
bio2zarr/vcf2zarr/icf.py:503: in transform_and_update_bounds                                                           
    value = self.transform(vcf_value)     
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
                                                                                                                      
self = <bio2zarr.vcf2zarr.icf.IntegerValueTransformer object at 0x7f1d07053160>, vcf_value = 111
                                                           
    def transform(self, vcf_value):                                                                                   
        if isinstance(vcf_value, tuple):
            vcf_value = [self.missing if v is None else v for v in vcf_value]                   
>       value = np.array(vcf_value, ndmin=self.dimension, copy=False)    
@jeromekelleher
Copy link
Contributor Author

Fixing that problem is easy enough with setting copy=None. However, we get something more sinister later in the process:

(numpy-2-venv) jk@empire$ python3 -m bio2zarr vcf2zarr explode tests/data/vcf/sample.vcf.gz tmp/x.icf -p0
    Scan: 100%|█████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 52.7files/s]
 Explode: 100%|███████████████████████████████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 429vars/s]
(numpy-2-venv) jk@empire$ python3 -m bio2zarr vcf2zarr encode tmp/x.icf tmp/x.vcz -f -p0
  Encode:  85%|███████████████████████████████████████████████████████████▊          | 792/927 [00:00<00:00, 7.81kB/s]Segmentation fault (core dumped)
(numpy-2-venv) jk@empire$ /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

So, during the encode step we've got a segfault. Digging in to this now.

@jeromekelleher
Copy link
Contributor Author

Hmm, so setting copy=True above resolves this segfault (which appears to happen when accessing the genotype data). So, setting copy=True seems fine and dandy to me - I'm sure the perf difference is negligible.

@jeromekelleher
Copy link
Contributor Author

With #257 we should be basically set for numpy 2.0 and numpy 1.x compatibility. To close this issue we should add a CI job that explicitly installs numpy 2.x and runs the tests. Later, when numpy 2.0 becomes the default thing we install (due to dependencies) we can switch this to 1.x.

We're waiting on numpy 2.0 compatible wheels from msprime, so no point in making this CI job until they arrive.

@jeromekelleher
Copy link
Contributor Author

Just waiting on numpy 2.0 wheels for msprime which should arrive in a few days, and we can then ship a numpy 2.0 compatible version.

@tomwhite
Copy link
Contributor

tomwhite commented Oct 2, 2024

This is blocking removal of VCF from sgkit (sgkit-dev/sgkit#1264), since we are using vcztools there for some compatibility tests and one of the test environments runs on NumPy 2 (https://github.com/sgkit-dev/sgkit/actions/runs/11145218525).

It looks like msprime now has NumPy 2 wheels, so it should be enough to do a bio2zarr release.

@jeromekelleher
Copy link
Contributor Author

I'm slightly reluctant to do a bio2zarr release before having a good look at the local alleles stuff. Can we point sgkit at the development version of bio2zarr for a while, keeping an issue tracking the fact we need to switch before next release?

@tomwhite
Copy link
Contributor

tomwhite commented Oct 8, 2024

Can we point sgkit at the development version of bio2zarr for a while

I managed to do this just for the NumPy GitHub action workflow, so I think we're good here.

jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Oct 23, 2024
jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Oct 23, 2024
jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants