Fix sn.al.enrichment KeyError #17

13acts · 2025-08-04T05:14:24Z

1. Context

KeyError                                  Traceback (most recent call last)
Cell In[107], line 1
----> 1 sn.al.enrichment(adata_all, id_key='dominant_cell_type', val_key='scNiche', library_key='Sample ID')

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in enrichment(adata, id_key, val_key, library_key)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in <listcomp>(.0)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexing.py:1183, in _LocationIndexer.__getitem__(self, key)
   1181     key = tuple(com.apply_if_callable(x, self.obj) for x in key)
   1182     if self._is_scalar_access(key):
-> 1183         return self.obj._get_value(*key, takeable=self._takeable)
   1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/frame.py:4226, in DataFrame._get_value(self, index, col, takeable)
   4220 engine = self.index._engine
   4222 if not isinstance(self.index, MultiIndex):
   4223     # CategoricalIndex: Trying to use the engine fastpath may give incorrect
   4224     #  results if our categories are integers that dont match our codes
   4225     # IntervalIndex: IntervalTree has no get_loc
-> 4226     row = self.index.get_loc(index)
   4227     return series._values[row]
   4229 # For MultiIndex going through engine effectively restricts us to
   4230 #  same-length tuples; see test_get_set_value_no_partial_indexing

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'Acinar cell'

2. Reason
When library_key does not contain all values of id_key, certain combinations may be missing from the resulting DataFrame.
For example, if 'Acinar cell' exists in other slides but not in slide #3, then:

df = obs.groupby([library_key, id_key, val_key]).size().unstack().fillna(0)

will omit the (library_key='#3', id_key='Acinar cell') row entirely.
As a result, trying to access:

df.loc[idx, val]

for that combination raises a KeyError during list comprehension like:

observed = [df.loc[idx, val] for df in df_list]

3. Fix
Replace the groupby().size().unstack().fillna(0) with pd.crosstab(..., dropna=False):

df = pd.crosstab(
    index=[obs[library_key], obs[id_key]],
    columns=obs[val_key],
    dropna=False
)

This ensures all combinations, including those with zero counts, are included — preventing missing rows and avoiding KeyError.

Cover case where slide does not contain all available id_key in adata object

ProDong0512 · 2025-08-15T03:25:15Z

1. Context

KeyError                                  Traceback (most recent call last)
Cell In[107], line 1
----> 1 sn.al.enrichment(adata_all, id_key='dominant_cell_type', val_key='scNiche', library_key='Sample ID')

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in enrichment(adata, id_key, val_key, library_key)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in <listcomp>(.0)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexing.py:1183, in _LocationIndexer.__getitem__(self, key)
   1181     key = tuple(com.apply_if_callable(x, self.obj) for x in key)
   1182     if self._is_scalar_access(key):
-> 1183         return self.obj._get_value(*key, takeable=self._takeable)
   1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/frame.py:4226, in DataFrame._get_value(self, index, col, takeable)
   4220 engine = self.index._engine
   4222 if not isinstance(self.index, MultiIndex):
   4223     # CategoricalIndex: Trying to use the engine fastpath may give incorrect
   4224     #  results if our categories are integers that dont match our codes
   4225     # IntervalIndex: IntervalTree has no get_loc
-> 4226     row = self.index.get_loc(index)
   4227     return series._values[row]
   4229 # For MultiIndex going through engine effectively restricts us to
   4230 #  same-length tuples; see test_get_set_value_no_partial_indexing

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'Acinar cell'

2. Reason When library_key does not contain all values of id_key, certain combinations may be missing from the resulting DataFrame. For example, if 'Acinar cell' exists in other slides but not in slide #3, then:

df = obs.groupby([library_key, id_key, val_key]).size().unstack().fillna(0)

will omit the (library_key='#3', id_key='Acinar cell') row entirely. As a result, trying to access:

df.loc[idx, val]

for that combination raises a KeyError during list comprehension like:

observed = [df.loc[idx, val] for df in df_list]

3. Fix Replace the groupby().size().unstack().fillna(0) with pd.crosstab(..., dropna=False):

df = pd.crosstab(
    index=[obs[library_key], obs[id_key]],
    columns=obs[val_key],
    dropna=False
)

This ensures all combinations, including those with zero counts, are included — preventing missing rows and avoiding KeyError.

Thanks for your commitment. We've tested it based on your suggestion but we didn't reproduce your bug. Could you provide your data to 1143715389@qq.com if possible?

fix enrichment

5b2e9f0

Cover case where slide does not contain all available id_key in adata object

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sn.al.enrichment KeyError #17

Fix sn.al.enrichment KeyError #17

Uh oh!

13acts commented Aug 4, 2025

Uh oh!

ProDong0512 commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix sn.al.enrichment KeyError #17

Are you sure you want to change the base?

Fix sn.al.enrichment KeyError #17

Uh oh!

Conversation

13acts commented Aug 4, 2025

Uh oh!

ProDong0512 commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants