Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is EventManagerCoherent.write_to_hdf() doing what we want? #4632

Open
titodalcanton opened this issue Feb 13, 2024 · 3 comments
Open

Is EventManagerCoherent.write_to_hdf() doing what we want? #4632

titodalcanton opened this issue Feb 13, 2024 · 3 comments
Assignees
Labels
PyGRB PyGRB development

Comments

@titodalcanton
Copy link
Contributor

titodalcanton commented Feb 13, 2024

In #4550 we noticed that the write_to_hdf() method of EventManagerCoherent does something unexpected with some of the properties of the templates. Specifically, it writes some of the template properties into each detector's dataset, instead of the network dataset. Here it does so for the template duration. Here it does it again for the template hash.

Now that I look closer at that method, I think there is an indentation bug throughout it, combined with an incorrect use of the f.prefix attribute. This would cause the HDF file to contain datasets like <detector>/search/time_slides while (I think) what we want is just search/time_slides. The lines that create the search/* and gating/* datasets should be outside the "Individual ifo stuff" loop, and there should be a f.prefix = '' before them.

Apparently a similar issue was noted in #2812, though not really fixed?!

More broadly, I think that method is long and complex and would benefit from being split into calls to shorter sub-methods.

@titodalcanton
Copy link
Contributor Author

titodalcanton commented Feb 14, 2024

Using the multi_inspiral example and looking at the output file with h5ls, I see the following groups and datasets:

H1/
    bank_chisq (336,)
    bank_chisq_dof (336,)
    chisq (336,)
    chisq_dof (336,)
    coa_phase (336,)
    cont_chisq (336,)
    end_time (336,)
    event_id (336,)
    search/
        end_time (1,)
        start_time (1,)
        time_slides (2,)
    sigmasq (336,)
    slide_id (336,)
    snr_h1 (336,)
    template_duration (336,)
    template_hash (336,)
    time_index (336,)
L1/
    bank_chisq (336,)
    bank_chisq_dof (336,)
    chisq (336,)
    chisq_dof (336,)
    coa_phase (336,)
    cont_chisq (336,)
    end_time (336,)
    event_id (336,)
    search/
        end_time (1,)
        start_time (1,)
        time_slides (2,)
    sigmasq (336,)
    slide_id (336,)
    snr_l (336,)
    template_duration (336,)
    template_hash (336,)
    time_index (336,)
V1/
    bank_chisq (336,)
    bank_chisq_dof (336,)
    chisq (336,)
    chisq_dof (336,)
    coa_phase (336,)
    cont_chisq (336,)
    end_time (336,)
    event_id (336,)
    search/
        end_time (1,)
        start_time (1,)
        time_slides (2,)
    sigmasq (336,)
    slide_id (336,)
    snr_v (336,)
    template_duration (336,)
    template_hash (336,)
    time_index (336,)
network/
    H1_event_id (336,)
    L1_event_id (336,)
    V1_event_id (336,)
    coherent_snr (336,)
    dec (336,)
    end_time_gc (336,)
    event_id (336,)
    my_network_chisq (336,)
    nifo (336,)
    null_snr (336,)
    ra (336,)
    reweighted_snr (336,)
    search/
        segments/
            end_times (3,)
            start_times (3,)
    slide_id (336,)
    template_id (336,)

There are a few weird things here:

  • Why do we have H1/snr_h1 but L1/snr_l and V1/snr_v, i.e. why does Hanford have the 1 suffix but not the other detectors? Surely this if statement must be a mistake. Why not just call the dataset snr?
  • As noted above, there should probably not be any <ifo>/template_* datasets. Every template property should be under the network group, or maybe even better under a separate template dataset. And then again, does this file really need to store any other template property apart from the hash?
  • Should the event_id and slide_id dataset only be in the network group? Or do the ones under <ifo> actually mean something else?
  • I think the search start/end times are assumed to be detector-independent (though I guess in principle this assumption might be avoided). If this is correct, then search/segments should probably be at the top level and there should be no search segments within any of the other groups.

@pannarale
Copy link
Contributor

This would also take care of: #4416

@titodalcanton
Copy link
Contributor Author

The gating and snr_* names have been fixed in #4639, but the other comments stand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PyGRB PyGRB development
Projects
Status: Todo
Development

No branches or pull requests

2 participants