Skip to content

Conversation

@13acts
Copy link

@13acts 13acts commented Aug 4, 2025

1. Context

KeyError                                  Traceback (most recent call last)
Cell In[107], line 1
----> 1 sn.al.enrichment(adata_all, id_key='dominant_cell_type', val_key='scNiche', library_key='Sample ID')

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in enrichment(adata, id_key, val_key, library_key)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in <listcomp>(.0)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexing.py:1183, in _LocationIndexer.__getitem__(self, key)
   1181     key = tuple(com.apply_if_callable(x, self.obj) for x in key)
   1182     if self._is_scalar_access(key):
-> 1183         return self.obj._get_value(*key, takeable=self._takeable)
   1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/frame.py:4226, in DataFrame._get_value(self, index, col, takeable)
   4220 engine = self.index._engine
   4222 if not isinstance(self.index, MultiIndex):
   4223     # CategoricalIndex: Trying to use the engine fastpath may give incorrect
   4224     #  results if our categories are integers that dont match our codes
   4225     # IntervalIndex: IntervalTree has no get_loc
-> 4226     row = self.index.get_loc(index)
   4227     return series._values[row]
   4229 # For MultiIndex going through engine effectively restricts us to
   4230 #  same-length tuples; see test_get_set_value_no_partial_indexing

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'Acinar cell'

2. Reason
When library_key does not contain all values of id_key, certain combinations may be missing from the resulting DataFrame.
For example, if 'Acinar cell' exists in other slides but not in slide #3, then:

df = obs.groupby([library_key, id_key, val_key]).size().unstack().fillna(0)

will omit the (library_key='#3', id_key='Acinar cell') row entirely.
As a result, trying to access:

df.loc[idx, val]

for that combination raises a KeyError during list comprehension like:

observed = [df.loc[idx, val] for df in df_list]

3. Fix
Replace the groupby().size().unstack().fillna(0) with pd.crosstab(..., dropna=False):

df = pd.crosstab(
    index=[obs[library_key], obs[id_key]],
    columns=obs[val_key],
    dropna=False
)

This ensures all combinations, including those with zero counts, are included — preventing missing rows and avoiding KeyError.

Cover case where slide does not contain all available id_key in adata object
@ProDong0512
Copy link

1. Context

KeyError                                  Traceback (most recent call last)
Cell In[107], line 1
----> 1 sn.al.enrichment(adata_all, id_key='dominant_cell_type', val_key='scNiche', library_key='Sample ID')

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in enrichment(adata, id_key, val_key, library_key)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/notebooks/Servier/scNiche/scniche/analysis/_utils.py:44, in <listcomp>(.0)
     41 pbar = tqdm(val_list)
     42 for val in pbar:
     43     # prop
---> 44     observed = [df.loc[idx, val] for df in df_list]
     45     expected = [df.drop(val, axis=1).loc[idx, ].mean() for df in df_list]
     47     # filter NA, some niches don't exist in every library

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexing.py:1183, in _LocationIndexer.__getitem__(self, key)
   1181     key = tuple(com.apply_if_callable(x, self.obj) for x in key)
   1182     if self._is_scalar_access(key):
-> 1183         return self.obj._get_value(*key, takeable=self._takeable)
   1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/frame.py:4226, in DataFrame._get_value(self, index, col, takeable)
   4220 engine = self.index._engine
   4222 if not isinstance(self.index, MultiIndex):
   4223     # CategoricalIndex: Trying to use the engine fastpath may give incorrect
   4224     #  results if our categories are integers that dont match our codes
   4225     # IntervalIndex: IntervalTree has no get_loc
-> 4226     row = self.index.get_loc(index)
   4227     return series._values[row]
   4229 # For MultiIndex going through engine effectively restricts us to
   4230 #  same-length tuples; see test_get_set_value_no_partial_indexing

File ~/.conda/envs/scniche/lib/python3.9/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'Acinar cell'

2. Reason When library_key does not contain all values of id_key, certain combinations may be missing from the resulting DataFrame. For example, if 'Acinar cell' exists in other slides but not in slide #3, then:

df = obs.groupby([library_key, id_key, val_key]).size().unstack().fillna(0)

will omit the (library_key='#3', id_key='Acinar cell') row entirely. As a result, trying to access:

df.loc[idx, val]

for that combination raises a KeyError during list comprehension like:

observed = [df.loc[idx, val] for df in df_list]

3. Fix Replace the groupby().size().unstack().fillna(0) with pd.crosstab(..., dropna=False):

df = pd.crosstab(
    index=[obs[library_key], obs[id_key]],
    columns=obs[val_key],
    dropna=False
)

This ensures all combinations, including those with zero counts, are included — preventing missing rows and avoiding KeyError.

Thanks for your commitment. We've tested it based on your suggestion but we didn't reproduce your bug. Could you provide your data to 1143715389@qq.com if possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants