-
Notifications
You must be signed in to change notification settings - Fork 11
Description
What happens
Running uv run pytest on Windows gives 11 test failures. 10 of them are caused by a UnicodeDecodeError when reading downloads_by_country.sql:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 2391: character maps to
The remaining 1 failure is a path separator assertion (\ vs /).
Why it happens
downloads_by_country.sql has emoji characters (cp1252 encoding, which can't handle these UTF-8 multi-byte characters.
The issue is in two places:
pipeline.py:195—query_file.read_text()(production code)- queries_test.py — multiple
query_file.read_text()calls (tests)
Neither specifies encoding="utf-8", so Python falls back to the system default.
The uploads_by_country.sql tests pass fine because that file has no emoji.
How to reproduce
# On Windows
uv sync --dev
uv run pytest library/tests/iqb/queries_test.py -vSuggested fix
Add encoding="utf-8" to all read_text() calls that load SQL templates:
Before
template_text = query_file.read_text()
After
template_text = query_file.read_text(encoding="utf-8")
Environment
Windows 10
Python 3.13.12
uv 0.10.4
Note
The CI only runs on ubuntu-latest, so this never showed up before. The 1 remaining failure is
test_init_with_custom_data_dir
asserting /custom/path but getting \custom\path on Windows — a separate path separator issue I can fix in the same PR if that's okay.