Skip to content

handle creator parsing with comma#59

Open
ogmanojdev wants to merge 2 commits intoMIT-LCP:mainfrom
ogmanojdev:fix/cli-creator-parsing
Open

handle creator parsing with comma#59
ogmanojdev wants to merge 2 commits intoMIT-LCP:mainfrom
ogmanojdev:fix/cli-creator-parsing

Conversation

@ogmanojdev
Copy link
Copy Markdown
Collaborator

Fix: Split from the right, not the left

Since the format is:

Name[,Email[,URL]]

Email and URL (if present) are always at the end. So we should split from the right using rsplit(",", maxsplit=2)

@rajna-fani
Copy link
Copy Markdown
Contributor

I tested the rsplit(",", maxsplit=2) approach locally against a few realistic --creator inputs. It only fixes a narrow case (when all 3 fields are provided and the name has exactly one comma), but leaves other common cases broken and can even make multi-comma names worse.

Examples:

  • "Google, LLC"

    • current (split): {"name": "Google", "email": "LLC"} (wrong)
    • this PR (rsplit): {"name": "Google", "email": "LLC"} (still wrong)
  • "Google, LLC,info@google.com"

    • current (split): {"name": "Google", "email": "LLC", "url": "info@google.com"} (wrong)
    • this PR (rsplit): {"name": "Google", "email": "LLC", "url": "info@google.com"} (still wrong)
  • "Google, LLC,info@google.com,https://google.com"

    • current (split): wrong
    • this PR (rsplit): correct (this is the one case that improves)
  • "Doe, Jr., John,john@example.com"

    • this PR (rsplit) produces incorrect field assignment (multi-comma names get worse)

Root issue: comma is an ambiguous delimiter (it can be part of a valid organization/person name). Splitting from the left vs right cannot fully resolve that ambiguity.

Potential solution direction: support an unambiguous delimiter (e.g., semicolons) for the structured form (Name;Email;URL). That would align better with Croissant’s intent that creator is a structured Person/Organization object, but the CLI input format needs a delimiter that can’t also be part of the name.

Also, it would be great to add tests covering comma-in-name cases.

Copy link
Copy Markdown
Collaborator

@rafiattrach rafiattrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest blocker is what @rajna-fani's mentioned above

@ogmanojdev ogmanojdev force-pushed the fix/cli-creator-parsing branch from 53c327a to 30f1e8b Compare April 2, 2026 09:29
@ogmanojdev
Copy link
Copy Markdown
Collaborator Author

added ; as preferred delimiter and added a test case covering all mentioned cases, tested that its passing

@ogmanojdev ogmanojdev requested a review from rafiattrach April 2, 2026 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants