Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown injections handling in offline #4360

Conversation

GarethCabournDavies
Copy link
Contributor

Allow the offline search to use injections which do not have an associated injection file.

For the use-case I am thinking of, we would be provided frame files with an unknown set of injections by an external team.

We would then use the alternative datafind settings on the config for the injected data, but use the original uninjected data for the background in the same way we do for known injections.

These would then show up as an additional minifollowup in the open box results, and a .hdf results file is made in the open box results directory which can be used for reporting results

Draft status is due to ongoing testing, this will be removed once the testing has completed and passed sensible checks

@GarethCabournDavies GarethCabournDavies self-assigned this May 12, 2023
@GarethCabournDavies GarethCabournDavies changed the title Draft: unknown injections handling in offline Unknown injections handling in offline May 26, 2023
@GarethCabournDavies
Copy link
Contributor Author

GarethCabournDavies commented May 26, 2023

I had been wanting to allow in-frame injections which have a file as well, in case of supplied known injections, but I realised that can be done using #4363 and overriding the --injection-scale-factor option to be (close to) infinite for those injections, similar to single_template-inj_params_noinj plotting option, so no development is needed for that

@GarethCabournDavies GarethCabournDavies changed the title Unknown injections handling in offline Draft: Unknown injections handling in offline May 26, 2023
Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm happy with this. Is there an example output page or test?

@@ -467,6 +467,9 @@ fore_xmlall = wf.make_foreground_table(workflow, combined_bg_file,
fore_xmlloudest = wf.make_foreground_table(workflow, combined_bg_file,
hdfbank, rdir['open_box_result'], singles=insps,
extension='.xml', tags=["xmlloudest"])
fore_hdfall = wf.make_foreground_table(workflow, combined_bg_file,
hdfbank, rdir['open_box_result'], singles=insps,
extension='.hdf', tags=["hdfall"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would appear to require changes to config files? Not sure I can see those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer, missed that!
Added now, and running test workflow to generate some results pages. I'll add some injection generation / searching into the CI tests as well (similar to the live CI)

@GarethCabournDavies
Copy link
Contributor Author

GarethCabournDavies commented Jun 1, 2023

Note that changes since the approval will show up on GarethCabournDavies/pycbc@1f4991f...unknown_injections

The URL has updated now that I've rebased to master

@GarethCabournDavies GarethCabournDavies changed the title Draft: Unknown injections handling in offline Unknown injections handling in offline Jun 1, 2023
@GarethCabournDavies
Copy link
Contributor Author

Note: Current failure in the CI will be fixed by #4404

@spxiwh
Copy link
Contributor

spxiwh commented Jul 2, 2023

Poke: Still failing search test.

@GarethCabournDavies
Copy link
Contributor Author

Searched through the logs and can't find any reasons why the coinc jobs for unknown injections are failing, just the transfers of the output file

@GarethCabournDavies
Copy link
Contributor Author

Okay, the CI failures are due to huge injinj / fullinj / injfull jobs which are being booted I think - at least they are taking a lot of time and memory on my local runs using the examples config

I think that merging #4420 will fix the issues, so unfortunately, that becomes another dependency!

@GarethCabournDavies
Copy link
Contributor Author

Results are currently showing that the injection run is using the full_data data. I think I have a solution, and am testing at the moment

@spxiwh spxiwh self-requested a review July 10, 2023 08:55
@spxiwh
Copy link
Contributor

spxiwh commented Jul 10, 2023

Looks like there's now conflicts here. Poke me when this is ready!

@GarethCabournDavies
Copy link
Contributor Author

Okay @spxiwh, this looks sensible on the results pages now - I thought the wrong data was being passed to the minifollowups, but it was me trying to overwrite rather than use a fresh directory

I'm slightly worried that the config file syntax becomes very complicated at this point, but I suppose this is a non-standard use-case

Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think I'm largely okay with this, do see some of the comments though.

bin/plotting/pycbc_page_foreground Show resolved Hide resolved
bin/workflows/pycbc_make_offline_search_workflow Outdated Show resolved Hide resolved
bin/workflows/pycbc_make_offline_search_workflow Outdated Show resolved Hide resolved
bin/workflows/pycbc_make_offline_search_workflow Outdated Show resolved Hide resolved
examples/search/analysis.ini Outdated Show resolved Hide resolved
plotting.ini \
executables.ini \
injections_minimal.ini \
injections_unknown.ini \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By how much does this slow down the example search workflow. While we want to test everything, the CI is becoming overstressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert the changes to the CI after final approval, if that is best?

Handwaving discussion over time though:

  • the workflow additions are all parallel to already-running jobs
  • the generation of data with injections does take a few minutes of added time before the workflow starts, and is not parallel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI only has a single machine, so cannot parallelize. It's going to have to run all the jobs here one by one (or perhaps two-by-two). Therefore we do need to know how much extra time this is adding, and if this feature is "niche" enough to warrant this, given that there's probably many possibilities in workflow that we are not testing.

examples/search/plotting.ini Outdated Show resolved Hide resolved
pycbc/io/hdf.py Outdated Show resolved Hide resolved
elif injection_method == "IN_FRAME":
# IN_FRAME means that we will supply frames with the appropriate
# injections unknown to the analysis
inj_files.append(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only substantial comment is that having a File be None is a hack. If anyone accidentally sent this file to a job, pegasus is going to [rightly] break. There doesn't seem to be a good way to handle this though ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, though people shouldn't do that, there wont be a nice error message to say what they have done which is wrong either

@spxiwh
Copy link
Contributor

spxiwh commented Jul 17, 2023

Okay, I'm going to remove this from my "to-do" list now. Please do consider CI time, but merge this once you are happy and the CI tests (including the search workflow!) have completed. If you make any more substantive changes, explicitly poke me again.

@tdent
Copy link
Contributor

tdent commented Jul 26, 2023

@spxiwh is the use case for this going to come up in the near future to your knowledge? If so, we will want this to be on master

@GarethCabournDavies
Copy link
Contributor Author

This will need a substantial re-think, as the use-case I've implemented here is unlikely to be used in the near future

What we are going to need is the case where injection frame files are supplied, and an injection definition file is also given, but we are not injecting them ourselves.

As a result, I will close, and think about how to adapt this for that use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants