New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

PyGRB trigger clustering with short timeslides #4820

Merged

pannarale merged 2 commits into gwastro:master from pannarale:trig_cluster

Jul 21, 2024

Contributor

pannarale commented Jul 18, 2024 •

edited

Loading

This PR enables taking into account short timeslides when clustering triggers in PyGRB. Prior to this, the clustering does not fail but it ignores what slide triggers belong to.

Standard information about the request

This is a: bug fix (in the sense that the code could run doing things incorrectly, but we were aware this enhancement had to be done and it was not a surprise).

This change affects: PyGRB

This change changes: scientific output

Contents

A loop over timeslides was added, ensuring that triggers from different slides are kept separate.

Links to any issues or associated PRs

The plotting scripts also need to plot information from a single slide (typically the zero-lag, i.e., unslid data): this is handled by #4809.

The number of slides performed, ensuring independent background events, was set in #4754

The author of this pull request confirms they will adhere to the code of conduct


          grb_trig_cluster now works for timeslides

50d2cfe

pannarale self-assigned this

pannarale added the PyGRB label

pannarale requested a review from sebastiangomezlopez

July 18, 2024 15:41

sebastiangomezlopez reviewed

View reviewed changes

bin/pygrb/pycbc_grb_trig_cluster


		# -- cluster ------------------------------------

Contributor

sebastiangomezlopez Jul 19, 2024

Do we still need this comment ?

Contributor Author

pannarale Jul 19, 2024

Yes, because this where we cluster. I will add a comment to flag where the output is written.

Contributor

sebastiangomezlopez Jul 19, 2024

Sorry, I commented on lines you removed. Now I see the comment in line 189 of your new code

sebastiangomezlopez reviewed

View reviewed changes

bin/pygrb/pycbc_grb_trig_cluster Outdated

-                  time = h5f["network"]["end_time_gc"][()]
-                  snr = h5f["network"][args.rank_column][()]
+              with h5py.File(args.trig_file, "r") as h5f:
+                  all_times = h5f["network"]["end_time_gc"][()]

Contributor

sebastiangomezlopez Jul 19, 2024 •

edited

Loading

This is more a matter of taste. Would you consider rewriting it as h5f["network/end_time_gc"][()]? Same for the other lines used to load hdf data.

Contributor Author

pannarale Jul 19, 2024

I did the same with the slide_id

sebastiangomezlopez reviewed

View reviewed changes

bin/pygrb/pycbc_grb_trig_cluster Outdated

-                  snr = h5f["network"][args.rank_column][()]
+              with h5py.File(args.trig_file, "r") as h5f:
+                  all_times = h5f["network"]["end_time_gc"][()]
+                  all_snrs = h5f["network"][args.rank_column][()]

Contributor

sebastiangomezlopez Jul 19, 2024 •

edited

Loading

In case you agree, here an f-string would do the job, h5f[f"network/{args.rank_column"}][()]

Contributor Author

pannarale Jul 19, 2024

h5f[f"network/{args.rank_column}"][()]

sebastiangomezlopez reviewed

View reviewed changes

bin/pygrb/pycbc_grb_trig_cluster

Comment on lines +189 to 244

+              # -- cluster ------------------------------------
+              unique_slide_ids = numpy.unique(slide_ids)
+              max_slide_id = max(unique_slide_ids)
+              msg = 'Clustering '+str(len(slide_ids))+' triggers from '
+              msg += str(len(unique_slide_ids))+' slides'
+              logging.info(msg)
+              for slide_id in unique_slide_ids:
+                  # indices to slice current slide
+                  slide_id_pos = numpy.where(slide_ids == slide_id)[0]
+                  # all time and snr values for the current slide
+                  time = all_times[slide_id_pos]
+                  snr = all_snrs[slide_id_pos]
+                  # generate clustering bins
+                  nbins = int((end - start) // win + 1)
+                  bins = [[] for i in range(nbins)]
+                  loudsnr = numpy.zeros(nbins)
+                  loudtime = numpy.zeros(nbins)
+                  # list to index clusters for current slide
+                  clusters = []
+                  # find loudest trigger in each bin, for the current slide
+                  for i in tqdm.tqdm(range(time.size),
+                                     desc="Initialising bins",
+                                     disable=not args.verbose,
+                                     total=time.size,
+                                     unit='triggers',
+                                     **TQDM_KW):
+                      t, s = time[i], snr[i]
+                      idx = int(float(t - start) // win)
+                      bins[idx].append(i)
+                      if s > loudsnr[idx]:
+                          loudsnr[idx] = s
+                          loudtime[idx] = t
+                  prev = -1
+                  nxt_ = 1
+                  first = True
+                  last = False
+                  add_cluster = clusters.append
+                  nclusters = 0
+                  # cluster
+                  bar = tqdm.tqdm(bins,
+                                  desc="Clustering bins",
+                                  disable=not args.verbose,
+                                  total=nbins,
+                                  unit='bins',
+                                  postfix=dict(nclusters=0),
+                                  **TQDM_KW)
+                  for i, bin_ in enumerate(bar):
+                      if not bin_:  # empty
                           continue

Contributor

sebastiangomezlopez Jul 19, 2024

This whole block of code looks good to me


          Fixes for PR

0efe7c1

sebastiangomezlopez reviewed

View reviewed changes

bin/pygrb/pycbc_grb_trig_cluster

Comment on lines +245 to +292

+                      for idx in bin_:
+                          t, s = time[idx], snr[idx]
-                      # check loudest event in next bin
-                      if not last:
-                          nextt = loudtime[nxt_]
-                          if nextt and abs(nextt - t) < win and s < loudsnr[nxt_]:
+                          if s < loudsnr[i]:  # not loudest in own bin
                               continue
-                      loudest = True
-                      # check all events in previous bin
-                      if not first and prevt and abs(prevt - t) < win:
-                          for id2 in bins[prev]:
-                              if abs(time[id2] - t) < win and s < snr[id2]:
-                                  loudest = False
-                                  break
-                      # check all events in next bin
-                      if loudest and not last and nextt and abs(nextt - t) < win:
-                          for id2 in bins[nxt_]:
-                              if abs(time[id2] - t) < win and s < snr[id2]:
-                                  loudest = False
-                                  break
-                      # this is loudest in its vicinity, keep it
-                      if loudest:
-                          add_cluster(idx)
-                          nclusters += 1
-                          bar.set_postfix(nclusters=nclusters)
+                          # check loudest event in previous bin
+                          if not first:
+                              prevt = loudtime[prev]
+                              if prevt and abs(prevt - t) < win and s < loudsnr[prev]:
+                                  continue
+                          # check loudest event in next bin
+                          if not last:
+                              nextt = loudtime[nxt_]
+                              if nextt and abs(nextt - t) < win and s < loudsnr[nxt_]:
+                                  continue
+                          loudest = True
+                          # check all events in previous bin
+                          if not first and prevt and abs(prevt - t) < win:
+                              for id2 in bins[prev]:
+                                  if abs(time[id2] - t) < win and s < snr[id2]:
+                                      loudest = False
+                                      break
+                          # check all events in next bin
+                          if loudest and not last and nextt and abs(nextt - t) < win:
+                              for id2 in bins[nxt_]:
+                                  if abs(time[id2] - t) < win and s < snr[id2]:
+                                      loudest = False
+                                      break
+                          # this is loudest in its vicinity, keep it
+                          if loudest:
+                              add_cluster(idx)
+                              nclusters += 1
+                              bar.set_postfix(nclusters=nclusters)
+                      # update things for next time
+                      first = False
+                      last = i == nbins - 1
+                      prev += 1
+                      nxt_ += 1
+                      bar.update()

Contributor

sebastiangomezlopez Jul 19, 2024

Mostly rearranging lines, looks good.

sebastiangomezlopez approved these changes

View reviewed changes

bin/pygrb/pycbc_grb_trig_cluster

Comment on lines +293 to 312

    
                  # clusters is the indexing array for a specific slide_id

                  # all_clusters is the (absolute) indexing of all clustered triggers

                  # so look up the indices [clusters] within the absolute indexing array

                  # slide_id_pos which is built at each slide_id

                  all_clusters += list(slide_id_pos[clusters])

                  msg = 'Slide '+str(slide_id)+'/'+str(max_slide_id)

                  msg += ' has '+str(len(slide_id_pos))

                  msg += ' trigers that were clustered to '+str(len(clusters))

                  logging.info(msg)

                  # update things for next time

                  first = False

                  last = i == nbins - 1

                  prev += 1

                  nxt_ += 1

              logging.info('Total clustered triggers: '+str(len(all_clusters)))

                  bar.update()

              # -- write output --------------------------------

              slice_hdf5(

                  args.trig_file,

                  outfile,

                  numpy.asarray(clusters),

                  numpy.asarray(all_clusters),

                  verbose=args.verbose,

              )

Contributor

sebastiangomezlopez Jul 19, 2024

Interesting approach overall. I'm ok with merging once tests are passed.

pannarale merged commit 50a5caf into gwastro:master

32 checks passed

pannarale deleted the trig_cluster branch

July 21, 2024 14:02

pannarale mentioned this pull request

PyGRB: Implement short time slides #3467

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PyGRB