-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
When an input catalog (DataFrame) contains more than one RA column variant (e.g. both 'ra' and 'RA', or 'Ra' and 'RA'), the code raises:
ValueError: The input dataframe must have two columns named 'Ra' and 'Dec'!
This message is misleading, because the required columns are present—there are simply multiple case variants. The user may think they are missing columns rather than having duplicates.
Code Triggering the Issue
The current logic in __type_pd_dataframe() of catalog.py:
RAS = ['ra', 'Ra', 'RA']
DECS = ['dec', 'Dec', 'DEC']
hit_ra = np.array([1 if col in RAS else 0 for col in self.input_data.columns])
hit_dec = np.array([1 if col in DECS else 0 for col in self.input_data.columns])
if sum(hit_ra) != 1 or sum(hit_dec) != 1:
raise ValueError("The input dataframe must have two columns named 'Ra' and 'Dec'!")Steps to Reproduce
- Construct a DataFrame (e.g.
df1) that includes both 'ra' and 'RA' (or any two of ['ra','Ra','RA']). - Call:
xmatch(df1, df2, TOLERANCE)wheredf2is any valid second catalog (or vice versa). - Observe the raised ValueError with the misleading message.
Current (Misleading) Behavior
Raises a generic error claiming the DataFrame must have 'Ra' and 'Dec', even though those columns (in some case form) are present; the real problem is ambiguity due to duplicates.
Desired / Expected Behavior
- A clearer error indicating multiple RA (or Dec) column variants were detected.
- Example improved message:
Ambiguous coordinate columns: multiple RA-like columns found ['ra', 'RA']. Please keep only one column.
Suggested Fix
- Detect when
sum(hit_ra) > 1orsum(hit_dec) > 1and raise a targeted error listing the conflicting column names. - Refactor to:
- Collect matched RA column names:
[col for col in self.input_data.columns if col in RAS] - If len == 0: raise "No RA column found (expected one of ra/Ra/RA)."
If len > 1: raise "Multiple RA columns found: [...]. Keep exactly one." - Repeat for Dec.
- Collect matched RA column names:
Potential Implementation Sketch
def _resolve_coord_column(self, candidates, label):
matches = [c for c in self.input_data.columns if c in candidates]
if len(matches) == 0:
raise ValueError(f"No {label} column found (expected one of {candidates}).")
if len(matches) > 1:
raise ValueError(f"Ambiguous {label} columns found: {matches}. Keep only one.")
return matches[0]Impact
- Reduces user confusion.
- Improves debuggability for users unfamiliar with internal case handling.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels