ClusterCell collection lacking cellID encondings, or including cells from heterogeneous subsystems with different CellIDEncodings #84

giovannimarchiori · 2024-06-18T00:03:38Z

The clustered cell collection (https://github.com/HEP-FCC/k4RecCalorimeter/blob/main/RecCalorimeter/src/components/CreateCaloClustersSlidingWindow.cpp#L45C8-L45C23) can potentially hold cells from heterogeneous subsystems which then have inconsistent CellIDEncodings. The issue and further details are extensively described and discussed in #83

In addition, even if only one system is used, the cell collection attached to the cluster does not have the CellIDEncoding. This is indeed potentially a problem (analysis scripts later in the chain may need this information to properly set-up the decoder).

BrieucF · 2024-06-21T03:41:00Z

Among the two options discussed (have similar cellID encodings for the different sub-systems or having one cluster cell collection per subsystem) I would be in favor of the latter. What do you think?

giovannimarchiori · 2024-11-05T23:41:44Z

just a reminder to @BrieucF and @tmadlener about the discussion we had in the past about this and some options discussed in #83 now that they are reviving the discussions on how to best handle this with podio/key4hep

tmadlener · 2024-11-06T09:46:15Z

Just to be sure that I understand tho goal correctly: There is a ClusterCollection acting as a superset collection (subset collection in podio terms) that contains Clusters from several subsystems that have potentially different CellID encodings. Is this correct? Or is this combined collection actually a proper collection of cloned Clusters?

Regardless of which is the case for the collection, it's necessary to get to the correct CellID encoding for each cluster in order to interpret the CellID correctly.

The main problem currently is that the combined / superset collection doesn't have any metadata set, resp. if we would set a CellID encoding for that collection we could only set one (assuming we use the usual convention to set that). However, how we can solve this depends quite a bit on whether the collection is a subset collection or whether it is a proper collection.

giovannimarchiori · 2024-11-06T10:41:43Z

Hi @tmadlener
What we have now is a ClusterCollection (which is not a subset collection) that contains clusters whose cells are from more than 1 system (e.g. ECAL + HCAL combined clusters), with different CellID encodings.

One could get the proper CellID by encoding in the metadata a dict of keys of type system : encoding, based on the assumption that system is always encoded in the first 4 bits of the cellID no matter the details of the readout, but that is neither elegant nor probably very robust and would maybe also break some design assumptions of podio.

By the way I was wondering after reading your remarks: do we have the possibility of creating (super)clusters of clusters? It could be interesting for instance for doing brem-recovery, but might also be a solution here if we persist our ECAL+HCAL cluster as just a cluster of 2 clusters, and the ECAL and HCAL clusters are stored in their own ClusterCollections with their metadata containing the proper cellID info.

tmadlener · 2024-11-06T17:19:43Z

Reading your comment, I think we are potentially coming to a similar situation with tracking detectors. Currently the assumption is that the CellID encoding for the tracking system is defined globally for the full detector and all subdetectors have the same. However, e.g. in the ILD/CLD hybrid for FCCee the two global definitions do not agree and it might be beneficial to only have the overall layout (i.e. the structure of the encoding but not the individual bit widths) defined. In that case some of the leading parts will probably have to agree.

I don't necessarily see a problem with that. In the end it will be a convention that we will have to document and make sure that we follow it. From a robustness point of view, we could start to think about coming up with some checks to make sure that the parts that need to agree.

Overall, I think this would be worth a dedicated discussion to see if we can come up with a single solution rather than different solutions for calorimeters and trackers for example.

would maybe also break some design assumptions of podio

I don't think we are running into podio (technical or design) limitations here, but rather into some other assumptions / conventions that we have come up with.

do we have the possibility of creating (super)clusters of clusters?

That shouldn't be a problem, as edm4hep::Clusters define a OneToManyRelation to other edm4hep::Clusters.

giovannimarchiori mentioned this issue Jun 18, 2024

Fixes for two types of crashes in recent nightlies #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterCell collection lacking cellID encondings, or including cells from heterogeneous subsystems with different CellIDEncodings #84

ClusterCell collection lacking cellID encondings, or including cells from heterogeneous subsystems with different CellIDEncodings #84

giovannimarchiori commented Jun 18, 2024 •

edited

Loading

BrieucF commented Jun 21, 2024 •

edited

Loading

giovannimarchiori commented Nov 5, 2024

tmadlener commented Nov 6, 2024

giovannimarchiori commented Nov 6, 2024

tmadlener commented Nov 6, 2024

ClusterCell collection lacking cellID encondings, or including cells from heterogeneous subsystems with different CellIDEncodings #84

ClusterCell collection lacking cellID encondings, or including cells from heterogeneous subsystems with different CellIDEncodings #84

Comments

giovannimarchiori commented Jun 18, 2024 • edited Loading

BrieucF commented Jun 21, 2024 • edited Loading

giovannimarchiori commented Nov 5, 2024

tmadlener commented Nov 6, 2024

giovannimarchiori commented Nov 6, 2024

tmadlener commented Nov 6, 2024

giovannimarchiori commented Jun 18, 2024 •

edited

Loading

BrieucF commented Jun 21, 2024 •

edited

Loading