Skip to content

fixpath: generalize prefix stripping with path extraction regex #14

@djdarcy

Description

@djdarcy

fixpath: generalize prefix stripping instead of hardcoded PS/quote checks

Problem

fixpath currently has specific checks for PowerShell prefix (PS ) and surrounding quotes. But the real pattern is more general: extract the actual path from whatever wraps it.

Current code:

# Strip PowerShell prompt prefix
if path.upper().startswith("PS "):
    rest = path[3:].lstrip()
    if rest and (rest[0] in "/\\~" or (len(rest) > 1 and rest[1] == ":")):
        path = rest

# Strip surrounding quotes and backticks
if len(path) >= 2:
    if (path[0] == '"' and path[-1] == '"') or ...

These are special cases of a generic problem: "there's a real path embedded in this string, extract it."

Proposed solution

Replace the hardcoded prefix/suffix checks with a generic path extractor that finds the first valid path pattern in the input string:

# Match a drive-letter path (C:\...) anywhere in the string
m = re.search(r'([a-zA-Z]):\\', path)
if m and m.start() > 0:
    # Everything before the drive letter is prefix noise
    path = path[m.start():]

# Also match Unix-style paths (/home/..., ~/..., /c/...)
# and UNC (\\server\...)

This would handle:

  • PS C:\code\file.md -- PowerShell prompt
  • >>> C:\code\file.md -- Python REPL
  • In [1]: C:\code\file.md -- IPython/Jupyter
  • user@host: C:\code\file.md -- SSH prompt copy-paste
  • "C:\code\file.md" -- quotes (the drive letter is inside)
  • `C:\code\file.md` -- backticks

The surrounding-character stripping (quotes, backticks) should still happen first as a cheap check, but the prefix stripping should be the generic regex approach.

Design considerations

  • The regex [a-zA-Z]:\\ is safe for Windows paths -- a single letter followed by :\ is unambiguous
  • For Unix paths, finding /home/ or ~/ or /mnt/ in a string is less reliable (could be part of a URL or sentence)
  • Keep quote/backtick stripping as a fast pre-pass (handles the common case without regex)
  • The generic extractor should be a fallback, not the first thing that runs
  • Consider: what if the "prefix" IS part of the path? e.g., PS C:\ could theoretically be a folder named PS C:\ -- but that's not a realistic scenario

Acceptance criteria

  • Generic prefix extraction replaces hardcoded PS check
  • Handles Python REPL (>>>), IPython (In [N]:), SSH prompts
  • Quote/backtick stripping remains as fast pre-pass
  • No false positives on paths that legitimately start with text before a drive letter
  • Unix path extraction handled where unambiguous (~/, /home/, /mnt/)
  • Existing tests still pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions