Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surrogate (special characters) handling in Python #429

Open
314Sami opened this issue Jan 23, 2025 · 0 comments
Open

Surrogate (special characters) handling in Python #429

314Sami opened this issue Jan 23, 2025 · 0 comments
Labels

Comments

@314Sami
Copy link

314Sami commented Jan 23, 2025

Describe the bug
Having files with special characters from Windows drives seems to cause problem. Not all special characters, though. It took many hours to figure what is the problem, as the filename(s) causing the error were not shown, only the ones before it and the error messages were not informative.

Steps to reproduce:

  1. Create/rename a file on Windows, the name containing special character like in Apocalypticas song Cortége. Scandic letters äöå work also sometimes, my guess is that filenames encoded in ISO-8859-1 will cause the error and modern "utf-8" wont. The main problem is that Windows handles surrogates differently. (Or if you don't want to make a file with problematic name for Python, there are online examples how to assign variable values so that they raise the error, for example https://stackoverflow.com/questions/27366479/python-3-os-walk-file-paths-unicodeencodeerror-utf-8-codec-cant-encode-s ).
  2. Move the file to Linux filesystem.
  3. Run 'organize run' with filters that match the file.
  4. Receive error: ERROR! 'utf-8' codec can't encode character '\udcf6' in position 105: surrogates not allowed
    Also others than dcf6 have been seen.

The program stops at the error and after deleting or renaming the file it needs to be started again and it takes a lot of time to get to the same spot.

Temporary fix
run the command with the following environmental variable set, the command will still produce errors but it will not crash. I will not quarantee that for example moving files would work with this "fix" so until there is a proper fix, I suggest to only print the files.

PYTHONIOENCODING=utf-8:surrogateescape organize run

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Manjaro Linux
  • Output of organize --version: organize v3.3.0

Your config file

rules:
  - locations: /mnt/mydata
    subfolders: true
    filters:
      - extension:
        - mp3
        - flac
        - wav
        - ogg
      - size
    actions:
      - write:
          outfile: "music.txt"
          text: "{size.traditional} -- {path}"
          mode: "append"
@314Sami 314Sami added the bug label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant