Skip to content

Misleading error when multiple RA/Dec column variants exist in input DataFrame #14

@technic960183

Description

@technic960183

Summary

When an input catalog (DataFrame) contains more than one RA column variant (e.g. both 'ra' and 'RA', or 'Ra' and 'RA'), the code raises:

ValueError: The input dataframe must have two columns named 'Ra' and 'Dec'!

This message is misleading, because the required columns are present—there are simply multiple case variants. The user may think they are missing columns rather than having duplicates.

Code Triggering the Issue

The current logic in __type_pd_dataframe() of catalog.py:

RAS = ['ra', 'Ra', 'RA']
DECS = ['dec', 'Dec', 'DEC']
hit_ra = np.array([1 if col in RAS else 0 for col in self.input_data.columns])
hit_dec = np.array([1 if col in DECS else 0 for col in self.input_data.columns])
if sum(hit_ra) != 1 or sum(hit_dec) != 1:
    raise ValueError("The input dataframe must have two columns named 'Ra' and 'Dec'!")

Steps to Reproduce

  1. Construct a DataFrame (e.g. df1) that includes both 'ra' and 'RA' (or any two of ['ra','Ra','RA']).
  2. Call: xmatch(df1, df2, TOLERANCE) where df2 is any valid second catalog (or vice versa).
  3. Observe the raised ValueError with the misleading message.

Current (Misleading) Behavior

Raises a generic error claiming the DataFrame must have 'Ra' and 'Dec', even though those columns (in some case form) are present; the real problem is ambiguity due to duplicates.

Desired / Expected Behavior

  • A clearer error indicating multiple RA (or Dec) column variants were detected.
  • Example improved message:
    Ambiguous coordinate columns: multiple RA-like columns found ['ra', 'RA']. Please keep only one column.

Suggested Fix

  • Detect when sum(hit_ra) > 1 or sum(hit_dec) > 1 and raise a targeted error listing the conflicting column names.
  • Refactor to:
    1. Collect matched RA column names: [col for col in self.input_data.columns if col in RAS]
    2. If len == 0: raise "No RA column found (expected one of ra/Ra/RA)."
      If len > 1: raise "Multiple RA columns found: [...]. Keep exactly one."
    3. Repeat for Dec.

Potential Implementation Sketch

def _resolve_coord_column(self, candidates, label):
    matches = [c for c in self.input_data.columns if c in candidates]
    if len(matches) == 0:
        raise ValueError(f"No {label} column found (expected one of {candidates}).")
    if len(matches) > 1:
        raise ValueError(f"Ambiguous {label} columns found: {matches}. Keep only one.")
    return matches[0]

Impact

  • Reduces user confusion.
  • Improves debuggability for users unfamiliar with internal case handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions