Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown injections handling in offline #4360

Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/search-workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ jobs:
run: bash -e examples/search/bank.sh
- name: generating statistic files
run: bash -e examples/search/stats.sh
- name: generating unknown injection files
run: |
cp examples/search/generate_injections.py ./
bash -e examples/search/generate_unknown_injections.sh
- name: running workflow
run: |
cp examples/search/*.ini ./
Expand Down
28 changes: 22 additions & 6 deletions bin/plotting/pycbc_page_foreground
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ parser.add_argument('--single-detector-triggers', nargs='+', default=None)
parser.add_argument('--verbose', action='count')
parser.add_argument('--output-file')
parser.add_argument('--num-to-write', type=int)
parser.add_argument('--table-title', default='Loudest Event Table',
help="Title to use for the table.")
parser.add_argument('--use-exclusive-ifar', action='store_true',
spxiwh marked this conversation as resolved.
Show resolved Hide resolved
help='Indicate to use exclusive IFAR rather than '
'inclusive. Incompatible with '
'--use-hierarchical-level.')
parser.add_argument('--use-hierarchical-level', type=int, default=None,
help='Indicate which FARs to write to the table '
'based on the number of hierarchical removals done. '
Expand All @@ -35,6 +41,10 @@ parser.add_argument('--use-hierarchical-level', type=int, default=None,
'of hierarchical removals done. [default=None]')
args = parser.parse_args()

if args.use_exclusive_ifar and args.use_hierarchical_level:
parser.error("Exclusive IFAR and hierarchical removal "
"level given, this makes no sense.")

f = h5py.File(args.trigger_file, 'r')

# Parse which inclusive background to use for the plotting
Expand Down Expand Up @@ -92,8 +102,12 @@ else :
n_loudest=args.num_to_write)

if args.output_file.endswith('.html'):
ifar = fortrigs.get_coincfile_array('ifar')
fap = fortrigs.get_coincfile_array('fap')
if args.use_exclusive_ifar:
ifar = fortrigs.get_coincfile_array('ifar_exc')
fap = fortrigs.get_coincfile_array('fap_exc')
else:
ifar = fortrigs.get_coincfile_array('ifar')
fap = fortrigs.get_coincfile_array('fap')
stat = fortrigs.get_coincfile_array('stat')
mass1 = fortrigs.get_bankfile_array('mass1')
mass2 = fortrigs.get_bankfile_array('mass2')
Expand Down Expand Up @@ -175,15 +189,17 @@ if args.output_file.endswith('.html'):
html_table = pycbc.results.html_table(columns, names,
format_strings=format_strings, page_size=10)

kwds = { 'title' : 'Loudest Event Table',
'cmd' :' '.join(sys.argv), }
kwds = {'title' : args.table_title,
'cmd' :' '.join(sys.argv), }
save_fig_with_metadata(str(html_table), args.output_file, **kwds)

elif args.output_file.endswith('.xml') or args.output_file.endswith('.xml.gz'):
fortrigs.to_coinc_xml_object(args.output_file)
fortrigs.to_coinc_xml_object(args.output_file,
exclusive=args.use_exclusive_ifar)

elif args.output_file.endswith('.hdf'):
fortrigs.to_coinc_hdf_object(args.output_file)
fortrigs.to_coinc_hdf_object(args.output_file,
exclusive=args.use_exclusive_ifar)

else:
raise NotImplementedError("Output file format not recognised")
141 changes: 105 additions & 36 deletions bin/workflows/pycbc_make_offline_search_workflow
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,9 @@ if 'hoft' in workflow.cp.get_subsections('workflow-datafind'):

datafind_files, analyzable_file, analyzable_segs, analyzable_name = \
wf.setup_datafind_workflow(workflow,
ssegs, "datafind",
seg_file=science_seg_file, tags=hoft_tags)
ssegs, "datafind",
seg_file=science_seg_file,
tags=hoft_tags)

final_veto_name = 'vetoes'
final_veto_file = wf.get_segments_file(workflow, final_veto_name,
Expand Down Expand Up @@ -458,7 +459,7 @@ ifar_ob = wf.make_ifar_plot(workflow, combined_bg_file,
rdir['open_box_result'],
tags=combined_bg_file.tags + ['open_box'],
executable='page_ifar_catalog')
table = wf.make_foreground_table(workflow, combined_bg_file,
fore_table = wf.make_foreground_table(workflow, combined_bg_file,
hdfbank, rdir['open_box_result'],
singles=insps, extension='.html',
tags=combined_bg_file.tags)
Expand All @@ -469,13 +470,12 @@ fore_xmlall = wf.make_foreground_table(workflow, combined_bg_file,
fore_xmlloudest = wf.make_foreground_table(workflow, combined_bg_file,
hdfbank, rdir['open_box_result'], singles=insps,
extension='.xml', tags=["xmlloudest"])
fore_hdfall = wf.make_foreground_table(workflow, combined_bg_file,
hdfbank, rdir['open_box_result'], singles=insps,
extension='.hdf', tags=["hdfall"])

symlink_result(table, 'open_box_result/significance')

# Set html pages
main_page = [(ifar_ob,), (table, )]
layout.two_column_layout(rdir['open_box_result'], main_page)

main_page = [(ifar_ob,), (fore_table, )]
#symlink_result(table, 'open_box_result/significance')
GarethCabournDavies marked this conversation as resolved.
Show resolved Hide resolved
#detailed_page = [(snrifar, ratehist), (snrifar_ifar, ifar_ob), (table,)]
#layout.two_column_layout(rdir['open_box_result/significance'], detailed_page)

Expand Down Expand Up @@ -563,37 +563,61 @@ splitbank_files_inj = wf.setup_splittable_workflow(workflow, [hdfbank],

# setup the injection files
inj_files_base, inj_tags = wf.setup_injection_workflow(workflow,
output_dir="inj_files")

output_dir="inj_files")
inj_files = []
for inj_file, tag in zip(inj_files_base, inj_tags):
inj_files.append(wf.inj_to_hdf(workflow, inj_file, 'inj_files', [tag]))
if inj_file is None:
# No injection file to convert
inj_files.append(inj_file)
else:
inj_files.append(wf.inj_to_hdf(workflow, inj_file, 'inj_files', [tag]))

inj_coincs = wf.FileList()

found_inj_dict ={}
insps_dict = {}
combined_inj_bg_files = {}

files_for_combined_injfind = []
for inj_file, tag in zip(inj_files, inj_tags):
ctags = [tag, 'injections']
ctags = [tag.lower(), 'injections']
GarethCabournDavies marked this conversation as resolved.
Show resolved Hide resolved
output_dir = '%s_coinc' % tag

if workflow.cp.has_option_tags('workflow-injections',
'compute-optimal-snr', tags=ctags):
optimal_snr_file = wf.compute_inj_optimal_snr(
workflow, inj_file, psd_files, 'inj_files', tags=ctags)
file_for_injfind = optimal_snr_file
else:
file_for_injfind = inj_file
if inj_file is not None:
if workflow.cp.has_option_tags('workflow-injections',
'compute-optimal-snr', tags=ctags):
optimal_snr_file = wf.compute_inj_optimal_snr(
workflow, inj_file, psd_files, 'inj_files', tags=ctags)
file_for_injfind = optimal_snr_file
else:
file_for_injfind = inj_file

files_for_combined_injfind.append(file_for_injfind)
files_for_combined_injfind.append(file_for_injfind)

# setup the matchedfilter jobs
insps = wf.setup_matchedfltr_workflow(workflow, analyzable_segs,
datafind_files, splitbank_files_inj,
output_dir, tags=ctags,
injection_file=inj_file)
if inj_file is None:
# datafind may be different for unknown injections
datafind_files_inj, _, analyzable_segs_inj, _ = \
wf.setup_datafind_workflow(
workflow,
ssegs,
"datafind",
seg_file=science_seg_file,
tags=hoft_tags + ctags
GarethCabournDavies marked this conversation as resolved.
Show resolved Hide resolved
)
else:
datafind_files_inj = datafind_files
analyzable_segs_inj = analyzable_segs

insps = wf.setup_matchedfltr_workflow(
workflow,
analyzable_segs_inj,
datafind_files_inj,
splitbank_files_inj,
output_dir,
tags=ctags,
injection_file=inj_file
)

insps = wf.merge_single_detector_hdf_files(workflow, hdfbank,
insps, output_dir, tags=ctags)
Expand Down Expand Up @@ -664,20 +688,22 @@ for inj_file, tag in zip(inj_files, inj_tags):
output_dir,
tags=combctags
)
combined_inj_bg_files[tag] = combined_inj_bg_file

found_inj = wf.find_injections_in_hdf_coinc(
workflow,
[combined_inj_bg_file],
[file_for_injfind],
censored_veto,
censored_veto_name,
output_dir,
tags=combctags)
if inj_file is not None:
found_inj = wf.find_injections_in_hdf_coinc(
workflow,
[combined_inj_bg_file],
[file_for_injfind],
censored_veto,
censored_veto_name,
output_dir,
tags=combctags)
found_inj_dict[tag] = found_inj

inj_coincs += [combined_inj_bg_file]

# Set files for plots
found_inj_dict[tag] = found_inj
insps_dict[tag] = insps

# And the combined INJFIND file
Expand All @@ -695,6 +721,9 @@ if len(files_for_combined_injfind) > 0:
############################ Injection plots #################################
# Per injection run plots
for inj_file, tag in zip(inj_files, inj_tags):
if inj_file is None:
# This is an unknown injection set
continue
found_inj = found_inj_dict[tag]
insps = insps_dict[tag]
injdir = rdir['injections/%s' % tag]
Expand Down Expand Up @@ -751,7 +780,7 @@ for inj_file, tag in zip(inj_files, inj_tags):
wf.make_throughput_plot(workflow, insps, rdir['workflow/throughput'],
tags=[tag])

######################## Make combined injection plots ##########################
####################### Make combined injection plots #########################
if len(files_for_combined_injfind) > 0:
sen_all = wf.make_sensitivity_plot(workflow, found_inj_comb,
rdir['search_sensitivity'],
Expand All @@ -771,13 +800,53 @@ if len(files_for_combined_injfind) > 0:
require='summ')
inj_summ = list(layout.grouper(inj_s + sen_s, 2))

###################### Results page for unknown injections ####################
for inj_file, tag in zip(inj_files, inj_tags):
if inj_file is not None:
# This is a known injection set and we will plot in the usual way
continue
# run minifollowups on the output of the loudest events
mfup_dir_inj = rdir[f'open_box_result/{tag}_injection_followup']
wf.setup_foreground_minifollowups(workflow, combined_inj_bg_files[tag],
insps_dict[tag],
hdfbank,
insp_files_seg_file,
data_analysed_name,
trig_generated_name,
'daxes',
mfup_dir_inj,
tags=[tag])
uk_table = wf.make_foreground_table(
workflow,
combined_inj_bg_files[tag],
hdfbank,
rdir['open_box_result'],
singles=insps_dict[tag],
extension='.html',
tags=[tag]
)
main_page.append((uk_table,))
# Make a summary hdf file
wf.make_foreground_table(
workflow,
combined_inj_bg_files[tag],
hdfbank,
rdir['open_box_result'],
singles=insps_dict[tag],
extension='.hdf',
tags=[tag]
)

# Set html pages
layout.two_column_layout(rdir['open_box_result'], main_page)

# Make analysis time summary
analysis_time_summ = [time_file, seg_summ_plot]
for f in analysis_time_summ:
symlink_result(f, 'analysis_time')
layout.single_layout(rdir['analysis_time'], (analysis_time_summ))

########################## Make full summary ####################################
######################### Make full summary ###################################
if len(files_for_combined_injfind) > 0:
summ = ([(time_file,)] + [(seg_summ_plot,)] +
[(seg_summ_table, veto_summ_table)] + det_summ + hist_summ +
Expand Down
21 changes: 12 additions & 9 deletions examples/search/analysis.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
file-retention-level = merged_triggers
start-time = 1186740100
end-time = 1186743500

[workflow-defaultvalues]
GarethCabournDavies marked this conversation as resolved.
Show resolved Hide resolved
h1-channel-name = H1:LOSC-STRAIN
l1-channel-name = L1:LOSC-STRAIN
v1-channel-name = V1:LOSC-STRAIN
Expand Down Expand Up @@ -112,17 +114,17 @@ finalize-events-template-rate = 500
injection-window = 4.5
processing-scheme = mkl

[single_template-h1&plot_singles_timefreq-h1&plot_qscan-h1&inspiral-h1&calculate_psd-h1]
frame-files = ${workflow|h1-frame-file}
channel-name = ${workflow|h1-channel-name}
[single_template-full_data-h1&plot_singles_timefreq-full_data-h1&plot_qscan-full_data-h1&inspiral-full_data-h1&calculate_psd-h1]
frame-files = ${workflow-defaultvalues|h1-frame-file}
channel-name = ${workflow-defaultvalues|h1-channel-name}

[single_template-l1&plot_singles_timefreq-l1&plot_qscan-l1&inspiral-l1&calculate_psd-l1]
frame-files = ${workflow|l1-frame-file}
channel-name = ${workflow|l1-channel-name}
[single_template-full_data-l1&plot_singles_timefreq-full_data-l1&plot_qscan-full_data-l1&inspiral-full_data-l1&calculate_psd-l1]
frame-files = ${workflow-defaultvalues|l1-frame-file}
channel-name = ${workflow-defaultvalues|l1-channel-name}

[single_template-v1&plot_singles_timefreq-v1&plot_qscan-v1&inspiral-v1&calculate_psd-v1]
frame-files = ${workflow|v1-frame-file}
channel-name = ${workflow|v1-channel-name}
[single_template-full_data-v1&plot_singles_timefreq-full_data-v1&plot_qscan-full_data-v1&inspiral-full_data-v1&calculate_psd-v1]
frame-files = ${workflow-defaultvalues|v1-frame-file}
channel-name = ${workflow-defaultvalues|v1-channel-name}

[calculate_psd]
cores = 1
Expand Down Expand Up @@ -191,6 +193,7 @@ cluster-window = ${statmap|cluster-window}
loudest-keep-values = 15.0:9999999999999

[coinc-injinj]
coinc-threshold = 0.0025

[sngls]
trigger-cuts = newsnr:5.5:lower traditional_chisq:12:upper sigma_multiple:10:upper
Expand Down
2 changes: 2 additions & 0 deletions examples/search/bank.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash
set -e

if [ ! -f bank.hdf ] ; then
pycbc_brute_bank \
--verbose \
--output-file bank.hdf \
Expand All @@ -19,3 +20,4 @@ pycbc_brute_bank \
--params mass1 mass2 spin1z spin2z \
--seed 1 \
--low-frequency-cutoff 20.0
fi
7 changes: 6 additions & 1 deletion examples/search/gen.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,10 @@ set -e
pycbc_make_offline_search_workflow \
--workflow-name gw \
--output-dir output \
--config-files analysis.ini plotting.ini executables.ini injections_minimal.ini \
--config-files \
analysis.ini \
plotting.ini \
executables.ini \
injections_minimal.ini \
injections_unknown.ini \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By how much does this slow down the example search workflow. While we want to test everything, the CI is becoming overstressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert the changes to the CI after final approval, if that is best?

Handwaving discussion over time though:

  • the workflow additions are all parallel to already-running jobs
  • the generation of data with injections does take a few minutes of added time before the workflow starts, and is not parallel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI only has a single machine, so cannot parallelize. It's going to have to run all the jobs here one by one (or perhaps two-by-two). Therefore we do need to know how much extra time this is adding, and if this feature is "niche" enough to warrant this, given that there's probably many possibilities in workflow that we are not testing.

--config-overrides results_page:output-path:$(pwd)/html
Loading
Loading