-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
When calling get_dataframe1() on a result object from xmatch, a ValueError is raised if the original input DataFrame contains a column named N_match. This appears to be caused by a naming conflict when the method attempts to append its own N_match column to the result DataFrame.
Error log
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[72], line 3
1 result = xmatch(df_A, df_B, 1)
2 print(result.number_distribution())
----> 3 result.get_dataframe1()
File spherimatch/result_xmatch.py:67, in XMatchResult.get_dataframe1(self, min_match, coord_columns, retain_all_columns, retain_columns)
65 if len(append_df.columns) > 0:
66 data_df = pd.concat([data_df, append_df], axis=1)
---> 67 data_df = data_df[data_df['N_match'] >= min_match]
68 return data_df
...
ValueError: cannot reindex on an axis with duplicate labels
Root cause
The method get_dataframe1() internally generates a column named N_match, but does not check whether this column already exists in the input DataFrame. If the input already contains a column with the same name, pd.concat results in duplicate column labels. Later filtering operations that rely on a unique N_match column then raise a ValueError.
Minimal steps to reproduce
import pandas as pd
from spherimatch import xmatch
df_A = pd.DataFrame({
'RA': [10.0, 20.0],
'DEC': [10.0, 20.0],
'N_match': [1, 2], # This triggers the bug
})
df_B = pd.DataFrame({
'RA': [10.01, 19.99],
'DEC': [10.01, 20.01],
})
result = xmatch(df_A, df_B, 1)
df_out = result.get_dataframe1()Expected behavior
The method should either:
- Avoid overwriting/conflicting with existing N_match columns, or
- Raise a clear and informative error if the column name already exists, or
- Allow users to specify the name of the output column.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels