Reproducible TrackId values when running on the same simulation. #126

VictorForouhar · 2025-11-02T13:02:45Z

The value of TrackId of the same subhaloes can change across HBT-HERONS re-runs, even when running on the same simulation and FoF group catalogue. This is because the values are assigned based on the index location of the new subhaloes in the std::vector<Subhalo_t> position local to each MPI rank. This means that the value can change randomly across runs if the number of MPI ranks are changed, and even unchanged because of different random ordering.

At best, it is annoying to compare how subhalo properties change across re-runs because you need to match subhalos. At worst, it can mean having to re-run a whole HBT-HERONS analysis if you accidentally overwrite early outputs. The new TrackId values are no longer reflective of those used in the original outputs, and hence the merger trees become a nightmare.

This PR changes the assignment procedure to be based on the global ranking of the HostHaloId of new subhaloes. The change will ensure reproducibility because ordering of std::vector<Subhalo_t> and MPI decomposition no longer play a role in setting TrackId.

Hence, if you re-run on the same simulation and FoF group catalogue, you will obtain the same TrackId for the same subhaloes across re-runs. Of course, if you change the FoF group catalogue the values will change (as HostHaloId will generally change), but in those cases you want to re-run HBT-HERONS anyway.

This change makes the TrackId of re-runs of HBT-HERONS on the same simulation reproducible at the TrackId level (as long as the FoF catalogue is unchanged).

I find the fact that there are two definitions stored in a single variable name cumbersome.

VictorForouhar · 2025-11-02T13:08:25Z

Here is an example debug print of how this works for the first snapshot of the COLIBRE test box:

# Number of new subhaloes in each rank
Rank 0 has 0 new subhaloes
Rank 1 has 3 new subhaloes

# HostHaloId values of the new subhaloes across ranks
Rank 0 has the following HostIDs with new subhaloes: ()
Rank 1 has the following HostIDs with new subhaloes: (2, 3, 1, )

# Above information is gathered across ranks as a vector 
Number of new births across all ranks: (0, 3, )

# Counts and displacements for MPI_Gatherv in rank 0.
Rank 0 will receive 3 entries.
Rank 0 offsets are (0, 0, )

# Gathered information from all ranks in rank 0
Rank 0 has GlobalHostHaloIds (2, 3, 1, )

# Do an argort on HostHaloID to get unique values to assign
Rank 0 has argsort (2, 0, 1, )

# Add offset of the number of pre-existing subhaloes (first snapshot, so offset = 0)
Rank 0 has offset argsort (2, 0, 1, )

# Do MPI_Scatterv into original ranks and assign the TrackIds.
Rank 0 has new TrackIds ()
Rank 1 has new TrackIds (2, 0, 1, )

VictorForouhar · 2025-11-02T14:15:40Z

Seems to be working as intended. I have run two HBT-HERONS re-runs using this new branch on the COLIBRE test box. Running the following code confirms that we do not change TrackId values for subhaloes that form at the same SnapshotOfBirth and HostHaloId (i.e. the same subhalos!).

new_changes_run_1 = HBTReader("/cosma7/data/dp004/dc-foro1/colibre/ReproducibleTrackIdChanges_run_1")
new_changes_run_2 = HBTReader("/cosma7/data/dp004/dc-foro1/colibre/ReproducibleTrackIdChanges_run_2")

number_of_mismatchs = 0

for snap_nr in new_changes_run_1.SnapshotIdList:
    
    subhalos_run_1 = new_changes_run_1.LoadSubhalos(snap_nr)
    subhalos_run_2 = new_changes_run_2.LoadSubhalos(snap_nr)

    # Only get TrackId of new subhalos
    new_subhalos_run_1 = subhalos_run_1[subhalos_run_1["SnapshotOfBirth"] == snap_nr]
    new_subhalos_run_2 = subhalos_run_2[subhalos_run_2["SnapshotOfBirth"] == snap_nr]
    # Sort by host halo id
    new_subhalos_run_1 = new_subhalos_run_1[np.argsort(new_subhalos_run_1["HostHaloId"])]
    new_subhalos_run_2 = new_subhalos_run_2[np.argsort(new_subhalos_run_2["HostHaloId"])]

    number_of_mismatchs += (new_subhalos_run_1["TrackId"] != new_subhalos_run_2["TrackId"]).sum()

print (number_of_mismatchs)

> 0

On the other hand, re-running twice with the master version does result in mismatches:

master_branch_run_1 = HBTReader("/cosma7/data/dp004/dc-foro1/colibre/master_run_1")
master_branch_run_2 = HBTReader("/cosma7/data/dp004/dc-foro1/colibre/master_run_2")

....

print (number_of_mismatchs)

> 56863

VictorForouhar · 2025-11-02T14:19:03Z

The bound mass functions are unchanged as well. Ready to be reviewed.

VictorForouhar · 2025-11-02T19:32:21Z

Latest commit implements one of the changes of #16, but I use the TrackId instead of ParticleId. I did two re-runs and compared whether the catalogues at each snapshot were the same (I set the MaxSampleSizePotentialEstimate to 200, to exacerbate differences).

new_changes_run_1 = HBTReader("/cosma7/data/dp004/dc-foro1/colibre/ReproducibleTracksRun1_FixSeed//")
new_changes_run_2 = HBTReader("/cosma7/data/dp004/dc-foro1/colibre/ReproducibleTracksRun2_FixSeed//")

number_of_mismatchs = 0

for snap_nr in new_changes_run_1.SnapshotIdList:
    
    subhalos_run_1 = new_changes_run_1.LoadSubhalos(snap_nr)
    subhalos_run_2 = new_changes_run_2.LoadSubhalos(snap_nr)

    # Sort by TrackId to get the same subhaloes 
    subhalos_run_1 = subhalos_run_1[np.argsort(subhalos_run_1["TrackId"])]
    subhalos_run_2 = subhalos_run_2[np.argsort(subhalos_run_2["TrackId"])]

    number_of_mismatchs += (subhalos_run_1 != subhalos_run_2).sum()

Before the latest commit, number_of_mismatchs = 138821. With the fixed seed for RNG, number_of_mismatchs = 8179. The first difference appears in the second snapshot.

VictorForouhar · 2025-11-02T19:39:36Z

Examining the first subhalo where there is a difference between runs, it is actually due to loss of precision in MboundType. All other entries are the same, but MboundType differs by:

[-3.7252903e-09  2.9802322e-08  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00]

VictorForouhar · 2025-11-03T09:54:23Z

In short, when fixing the seed and finding all subhaloes that differ in their Nbound, we have 53 subhaloes with differences. If I further disable subhalo sinking it goes down to 39.

I though subhalo sinking may play a role because of the order in which overlap is evaluated, and hence the resulting ordering of the Particles vector for the subhalo that accretes said particles.

VictorForouhar · 2025-11-03T11:03:17Z

Example subhalo which deviates between runs. Both instances agree in Nbound until one snapshot after infall, at which point they differ by 2 particles.

However, differences already appear when looking at the ordering of bound and sorted particles before, at snapshot 84. 6 particles differ in their binding energy ranking, as shown below. The differences grow to 41 particles at the following snapshot. Eventually, this leads to different Nbound.

I will look into why there is a swap of those three particles in snap 84.

…ible_trackids

robjmcgibbon

Looks good, I agree that the subhalo_unbind.cpp changes would be better in another PR though

This reverts commit 5c66f63.

VictorForouhar added 3 commits November 2, 2025 12:14

Assignment of new TrackIds reflects HostHaloId ranking

f9a965d

This change makes the TrackId of re-runs of HBT-HERONS on the same simulation reproducible at the TrackId level (as long as the FoF catalogue is unchanged).

Formatting

7c16918

Use global HostHaloId, rather than local value.

5e7bde6

I find the fact that there are two definitions stored in a single variable name cumbersome.

Lowest relative HostHaloId gets lowest relative new TrackId values.

4d8586b

VictorForouhar requested a review from robjmcgibbon November 2, 2025 14:19

VictorForouhar mentioned this pull request Nov 2, 2025

Try to make random sampling for unbinding reproducible #16

Draft

Set a set for random subsampling

5c66f63

Sort FoF particles after loading for reproducible results.

b666f72

VictorForouhar mentioned this pull request Nov 3, 2025

Sort FoF particles after loading for reproducible results. #127

Merged

Merge branch 'consistent_particle_vectors_for_centrals' into reproduc…

8bb9048

…ible_trackids

robjmcgibbon approved these changes Nov 6, 2025

View reviewed changes

Revert "Set a set for random subsampling"

63736b6

This reverts commit 5c66f63.

VictorForouhar merged commit 4399e98 into master Nov 6, 2025
4 checks passed

VictorForouhar deleted the reproducible_trackids branch November 6, 2025 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducible TrackId values when running on the same simulation. #126

Reproducible TrackId values when running on the same simulation. #126

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025 •

edited

Loading

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 3, 2025

Uh oh!

VictorForouhar commented Nov 3, 2025

Uh oh!

robjmcgibbon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reproducible TrackId values when running on the same simulation. #126

Reproducible TrackId values when running on the same simulation. #126

Uh oh!

Conversation

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VictorForouhar commented Nov 2, 2025

Uh oh!

VictorForouhar commented Nov 3, 2025

Uh oh!

VictorForouhar commented Nov 3, 2025

Uh oh!

robjmcgibbon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VictorForouhar commented Nov 2, 2025 •

edited

Loading