fix: resolve Windows test failures from encoding and path issues#152
fix: resolve Windows test failures from encoding and path issues#152saurabh12nxf wants to merge 2 commits intom-lab:mainfrom
Conversation
…tform path comparison
|
What gives me guarantees that this code runs correctly on Windows? Perhaps you could try to add a GitHub workflow that ensures the tests are working as intended? Also, can you comment on each line of code you changed and explain to me it is correct, please? Maybe I am missing something and I'd like to understand your reasoning. |
|
Hi @bassosimone, thanks for the review! I've addressed both points:
Root cause: The SQL files (e.g. On Linux/macOS, read_text() defaults to UTF-8 - works fine. On Windows, Python defaults to cp1252 (via locale.getpreferredencoding()). The emoji's UTF-8 byte sequence contains 0x8F, which is unmappable in cp1252, so it raises: The fix is explicit encoding="utf-8" - which is also consistent with the very next line that already assumes UTF-8: This is production code - without this fix, Why it's safe on Linux: Passing encoding="utf-8" on a system that already defaults to UTF-8 is a no-op - identical behavior. library/tests/iqb/queries_test.py
Why: These tests read the same SQL files that contain library/tests/iqb/cache/cache_test.py
Why: cache.data_dir is a pathlib.Path. On Linux, str(Path("/custom/path")) returns "/custom/path" - matches the assertion. On Windows, it returns "\custom\path" (Windows normalizes forward slashes to backslashes), so the string comparison fails. The fix compares Path objects directly. Path("/custom/path") == Path("/custom/path") is True on all platforms because both sides go through the same OS-level normalization. This tests the actual intent - "does the cache store the correct path?" - rather than "does the path's string representation look a certain way on this OS? |
|
After listing our project for GSoC, we received a large amount of pull requests across several repositories. We are dealing with the backlog, but this would take time. We will get back to this pull request eventually. In the meanwhile, if you are a GSoC applicant, please read our updated GSoC policy: https://github.com/m-lab/gsoc/. |
Fixes #150
Problem
Running
uv run pyteston Windows produces 11 test failures:UnicodeDecodeError— downloads_by_country.sql contains emoji (read_text()defaults tocp1252which can't decode these UTF-8 bytes.\custom\pathon Windows.Changes
library/src/iqb/pipeline/pipeline.py
encoding="utf-8"toread_text()call that loads SQL query templates. This is production code — without this fix, the pipeline itself breaks on Windows.library/tests/iqb/queries_test.py
encoding="utf-8"fix for allread_text()calls in tests.library/tests/iqb/cache/cache_test.py
Path()comparison for cross-platform compatibility. AddedPathimport.Verification
Before: 279 passed, 11 failed
After: 290 passed, 0 failed