Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyGRB: Update postprocessing functions #4550

Merged
merged 40 commits into from
Feb 15, 2024
Merged

Conversation

jakeb245
Copy link
Contributor

This PR is another step in resolving #4419. It should finish off converting the backend parts of the old XML functionality into HDF. The functionality is intended to be the same, but there could be some cases in here where it should be changed.

@jakeb245 jakeb245 self-assigned this Oct 31, 2023
@jakeb245 jakeb245 added the PyGRB PyGRB development label Oct 31, 2023
bin/pycbc_multi_inspiral Outdated Show resolved Hide resolved
pycbc/results/pygrb_postprocessing_utils.py Outdated Show resolved Hide resolved
@jakeb245
Copy link
Contributor Author

jakeb245 commented Jan 3, 2024

I think the issue keeping this PR open is what information about templates should be written into the output files. I'm told the main PyCBC searches only write the template ID, and then go back to the template file to get that information. I've started to break this convention in the current state of these code changes by writing the template masses to the trigger output files.

The template ID that is currently written in the WIP PyGRB code is effectively an index of the split template file. Using this would require us to keep track of the thousands of template files which does not seems like a great idea. I'm thinking the best way to go about this is to write the template hash in place of the template ID, which would allow us to find the templates in the full template bank in postprocessing.

Pinging @pannarale and @titodalcanton for discussion. I'd like to have a bit of input before trying to code this.

@titodalcanton
Copy link
Contributor

@jakeb245 yes, I would use a similar approach used by pycbc_inspiral in the all-sky all-time search, which indeed consists in storing just the template hash for each trigger after matched filtering. To be precise, the following happens there:

  1. pycbc_inspiral does matched filtering for each bank split separately, and the corresponding triggers store the template hashes only.
  2. pycbc_coinc_mergetrigs then takes all the triggers from all the bank splits from step 1, merges them together, and uses the template hashes to recover the template IDs into the full bank.

Have a look at this documentation page for more info: https://pycbc.org/pycbc/latest/html/formats/hdf_format.html. You can also see an example of this by looking at the various files from one of the all-sky all-time O4 chunks, in particular you want to look at one of the full_data/*-INSPIRAL_BANK*.hdf files for step 1, and full_data/*-HDF_TRIGGER_MERGE_FULL_DATA*.hdf for step 2.

Ping me on Slack if more is needed.

@titodalcanton
Copy link
Contributor

@jakeb245 I had a close look at this PR. As you know I am a bit removed from the development of this code, so I left a bunch of questions above to fill some gaps in my understanding, but I do not see anything too concerning here. I am happy to approve this once you have answered the comments.

@titodalcanton
Copy link
Contributor

I think there are a couple things to fix (noted above), then I think this can go in.

jakeb245 and others added 2 commits February 14, 2024 19:25
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
@titodalcanton
Copy link
Contributor

I am still not super convinced that I understand the logic of the slide_id/event_id thing here, so whoever is developing that functionality should keep it in mind. But let's merge this and continue developing.

@titodalcanton titodalcanton dismissed pannarale’s stale review February 15, 2024 10:11

I think this has been addressed

@titodalcanton titodalcanton merged commit df3ba44 into gwastro:master Feb 15, 2024
33 checks passed
bhooshan-gadre pushed a commit to bhooshan-gadre/pycbc that referenced this pull request Mar 4, 2024
* Add mapping function

* Implement mapping in trig combiner

* Fix "template_id" dataset being written too early

* Force template_id to be integer

* Try adding bank files to workflow

* Remove fixme

* Only use one bank_file

* Add template mapping to inj finder

* Small change

* mapping function works with trig file object

* Small change

* Remove typo

* Add bank file opt to inj_finder

* Add template masses to multi_inspiral output

* sort_trigs updates

* Extract trig properties

* Add old imports back for CodeClimate

* Remove unused bestnr opts

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Francesco Pannarale <pannarale@users.noreply.github.com>

* Codeclimate

* Remove masses from multi inspiral output

* Correct segment loading names

* Add NoneType handling to _slide_vetoes function

* Indentation fix

* Add 's' back in

* Fix docstring(?)

* Codeclimate

* Codeclimate

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* Uses event_ids in sort_trigs to avoid confusion

* Add possibility of multiple banks (and NotImplementedError)

* Remove enumerate and fix indexing issue

* Check for single bank earlier

* Simplify column name check

* Use zip()

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

---------

Co-authored-by: Francesco Pannarale <pannarale@users.noreply.github.com>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
lpathak97 pushed a commit to lpathak97/pycbc that referenced this pull request Mar 13, 2024
* Add mapping function

* Implement mapping in trig combiner

* Fix "template_id" dataset being written too early

* Force template_id to be integer

* Try adding bank files to workflow

* Remove fixme

* Only use one bank_file

* Add template mapping to inj finder

* Small change

* mapping function works with trig file object

* Small change

* Remove typo

* Add bank file opt to inj_finder

* Add template masses to multi_inspiral output

* sort_trigs updates

* Extract trig properties

* Add old imports back for CodeClimate

* Remove unused bestnr opts

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Francesco Pannarale <pannarale@users.noreply.github.com>

* Codeclimate

* Remove masses from multi inspiral output

* Correct segment loading names

* Add NoneType handling to _slide_vetoes function

* Indentation fix

* Add 's' back in

* Fix docstring(?)

* Codeclimate

* Codeclimate

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* Uses event_ids in sort_trigs to avoid confusion

* Add possibility of multiple banks (and NotImplementedError)

* Remove enumerate and fix indexing issue

* Check for single bank earlier

* Simplify column name check

* Use zip()

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

---------

Co-authored-by: Francesco Pannarale <pannarale@users.noreply.github.com>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
acorreia61201 pushed a commit to acorreia61201/pycbc that referenced this pull request Apr 4, 2024
* Add mapping function

* Implement mapping in trig combiner

* Fix "template_id" dataset being written too early

* Force template_id to be integer

* Try adding bank files to workflow

* Remove fixme

* Only use one bank_file

* Add template mapping to inj finder

* Small change

* mapping function works with trig file object

* Small change

* Remove typo

* Add bank file opt to inj_finder

* Add template masses to multi_inspiral output

* sort_trigs updates

* Extract trig properties

* Add old imports back for CodeClimate

* Remove unused bestnr opts

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Francesco Pannarale <pannarale@users.noreply.github.com>

* Codeclimate

* Remove masses from multi inspiral output

* Correct segment loading names

* Add NoneType handling to _slide_vetoes function

* Indentation fix

* Add 's' back in

* Fix docstring(?)

* Codeclimate

* Codeclimate

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* Uses event_ids in sort_trigs to avoid confusion

* Add possibility of multiple banks (and NotImplementedError)

* Remove enumerate and fix indexing issue

* Check for single bank earlier

* Simplify column name check

* Use zip()

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* Update pycbc/results/pygrb_postprocessing_utils.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

---------

Co-authored-by: Francesco Pannarale <pannarale@users.noreply.github.com>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PyGRB PyGRB development
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants