Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows build failing with SSL error #733

Closed
tomwhite opened this issue Oct 11, 2021 · 7 comments · Fixed by #731
Closed

Windows build failing with SSL error #733

tomwhite opened this issue Oct 11, 2021 · 7 comments · Fixed by #731
Labels
bug Something isn't working upstream Used when our build breaks due to upstream changes

Comments

@tomwhite
Copy link
Collaborator

From https://github.com/pystatgen/sgkit/runs/3844122469?check_suite_focus=true:

=================================== ERRORS ====================================
_______________ ERROR collecting benchmarks/benchmarks_stats.py _______________
benchmarks\benchmarks_stats.py:4: in <module>
    import xarray as xr
C:\Miniconda3\envs\test\lib\site-packages\xarray\__init__.py:3: in <module>
    from . import testing, tutorial, ufuncs
C:\Miniconda3\envs\test\lib\site-packages\xarray\tutorial.py:13: in <module>
    from .backends.api import open_dataset as _open_dataset
C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\__init__.py:6: in <module>
    from .cfgrib_ import CfGribDataStore
C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\cfgrib_.py:16: in <module>
    from .locks import SerializableLock, ensure_lock
C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\locks.py:13: in <module>
    from dask.distributed import Lock as DistributedLock
C:\Miniconda3\envs\test\lib\site-packages\dask\distributed.py:11: in <module>
    from distributed import *
C:\Miniconda3\envs\test\lib\site-packages\distributed\__init__.py:8: in <module>
    from .actor import Actor, ActorFuture
C:\Miniconda3\envs\test\lib\site-packages\distributed\actor.py:5: in <module>
    from .client import Future
C:\Miniconda3\envs\test\lib\site-packages\distributed\client.py:53: in <module>
    from .batched import BatchedSend
C:\Miniconda3\envs\test\lib\site-packages\distributed\batched.py:10: in <module>
    from .core import CommClosedError
C:\Miniconda3\envs\test\lib\site-packages\distributed\core.py:24: in <module>
    from .comm import (
C:\Miniconda3\envs\test\lib\site-packages\distributed\comm\__init__.py:25: in <module>
    _register_transports()
C:\Miniconda3\envs\test\lib\site-packages\distributed\comm\__init__.py:17: in _register_transports
    from . import inproc, tcp, ws
C:\Miniconda3\envs\test\lib\site-packages\distributed\comm\tcp.py:19: in <module>
    from tornado import netutil
C:\Miniconda3\envs\test\lib\site-packages\tornado\netutil.py:34: in <module>
    _client_ssl_defaults = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
C:\Miniconda3\envs\test\lib\ssl.py:589: in create_default_context
    context.load_default_certs(purpose)
C:\Miniconda3\envs\test\lib\ssl.py:490: in load_default_certs
    self._load_windows_store_certs(storename, purpose)
C:\Miniconda3\envs\test\lib\ssl.py:482: in _load_windows_store_certs
    self.load_verify_locations(cadata=certs)
E   ssl.SSLError: unknown error (_ssl.c:4034)

Looks like it might be this error: jupyter/notebook#4245.

@tomwhite tomwhite added the bug Something isn't working label Oct 11, 2021
@hammer
Copy link
Contributor

hammer commented Oct 11, 2021

Yikes, this one is bad. Filed in January 2019, still not fixed: https://bugs.python.org/issue35665.

Apparently there are some bad certs in the Windows CA store and the Python ssl library tries to load all certs rather than just the ones needed by the application. The ssl authors are trying to push error handling upstream to the OpenSSL library: openssl/openssl#16701. The OpenSSL team is correctly noting that all other users avoid this problem and ssl needs to fix itself. Unfortunately ssl is bundled with Python and so can't fix itself prior to the next Python release. Another reason "batteries included" is a bug not a feature and minimal standard libraries are better.

There is some code posted as a workaround: yt-dlp/yt-dlp@599ca41.

Although if we are hitting this problem why isn't it being reported as a nested asn1 error?

@hammer hammer added the upstream Used when our build breaks due to upstream changes label Oct 11, 2021
@tomwhite
Copy link
Collaborator Author

Yikes, this one is bad.

Yes it is! I just tried this workaround: jupyter/notebook#4245 (comment) (copying DLLs) and it didn't have any effect unfortunately.

There is some code posted as a workaround: yt-dlp/yt-dlp@599ca41.

We're not handling the certs ourselves (tornado is, according to the stacktrace), so it would be tricky to use that workaround.

@tomwhite
Copy link
Collaborator Author

Looks like it's possible to fix by monkey patching the SSL method for windows...

tomwhite added a commit to tomwhite/sgkit that referenced this issue Oct 12, 2021
@tomwhite tomwhite linked a pull request Oct 12, 2021 that will close this issue
@jeromekelleher
Copy link
Collaborator

This is very nasty, I agree. I'm worried about us fiddling around with certificates in our imports, that seems very heavy-handed and potentially makes us a vector for security issues.

What I don't understand is why this has started happening now. What has changed in our dependency chain that has caused this? Shouldn't this be an upstream issue with whatever library is provoking it? I mean, we're not directly doing anything with the network, so it seems weird that we'd be monkey-patching SSL libraries.

@tomwhite
Copy link
Collaborator Author

@jeromekelleher, I agree. I'm not sure why this has started happening, it's been present since Python 3.7 I think, so perhaps GA actions changed the version or configuration of the Windows OS it runs on?

Since this is blocking the build, I think we should either use the monkey-patched solution (and open an issue to track its removal), or disable the windows build.

@jeromekelleher
Copy link
Collaborator

Agreed, I vote to merge.

Perhaps we just do the monkey patching in CI then as a non intrusive workaround?

@tomwhite
Copy link
Collaborator Author

Perhaps we just do the monkey patching in CI then as a non intrusive workaround?

Good idea. I've updated the PR to do this now, but the windows runner doesn't seem to be available...

@mergify mergify bot closed this as completed in #731 Oct 14, 2021
mergify bot pushed a commit that referenced this issue Oct 14, 2021
tomwhite added a commit to tomwhite/sgkit that referenced this issue Nov 15, 2021
mergify bot pushed a commit that referenced this issue Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Used when our build breaks due to upstream changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants