Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-660632: Better support for EXTERNALBROWSER auth from local containers #1251

Closed
phil-hawkins opened this issue Sep 15, 2022 · 9 comments · Fixed by #1886
Closed

SNOW-660632: Better support for EXTERNALBROWSER auth from local containers #1251

phil-hawkins opened this issue Sep 15, 2022 · 9 comments · Fixed by #1886
Labels

Comments

@phil-hawkins
Copy link

phil-hawkins commented Sep 15, 2022

What is the current behaviour?

EXTERNALBROWSER authentication is the best option in the case of SSO with MFA. However, when performed inside a local container such as a VSCode docker devcontainer, there is a race condition between the auth callback to the localhost and starting the receiver app on the localhost and auto-forwarding the randomly assigned port. This can lead to an intermittent failure to connect.

One way to address this is to assign a fixed port number with the SF_AUTH_SOCKET_PORT environment variable and statically forward this port. This removes the delay in detecting and autoforwarding a randomly assigned port. However a new problem arises: the port remains in a TIME_WAIT state for a few minutes after it is used and subsequent connections fail with OSError: [Errno 98] Address already in use during this period.

What is the desired behaviour?

EXTERNALBROWSER auth works every time from inside a local container.

An option is to set up a static port forwarding rule as above and set the SO_REUSEADDR state of the port, perhaps determined by the value of new environment variable such as SF_AUTH_SOCKET_REUSEADDR.

How would this improve snowflake-connector-python?

EXTERNALBROWSER auth would work more seamlessly from local containers.

References, Other Background

socket(7) — Linux manual page
SF_AUTH_SOCKET_PORT code

I'm up for doing a PR to address this.

@github-actions github-actions bot changed the title Better support for EXTERNALBROWSER auth from local containers SNOW-660632: Better support for EXTERNALBROWSER auth from local containers Sep 15, 2022
@sfc-gh-yixie
Copy link
Collaborator

@phil-hawkins The connector already supports external browser. Does it solve your problem? See this API doc. https://docs.snowflake.com/en/user-guide/python-connector-api

@pmalafosse
Copy link

pmalafosse commented Mar 15, 2023

Is the goal of that to be able to connect to Snowflake using EXTERNALBROWSER authentication from a Docker container? From the description, it seems you are able to do it already even if it doesn't work all the time, would it be possible to know more about it please @phil-hawkins ?

Edit: never mind I managed to do it like that https://stackoverflow.com/questions/67325900/is-it-possible-to-use-externalbrowser-authenticator-inside-docker-container-fo

@sfc-gh-aalam
Copy link
Collaborator

@phil-hawkins would you still be interested in creating a PR for this? Our team is happy to review it

@podung
Copy link
Contributor

podung commented Jul 26, 2023

@sfc-gh-aalam I've got a couple fixes for this issue and am preparing a PR. More info today or tomorrow

@podung
Copy link
Contributor

podung commented Jul 27, 2023

Observable Behavior

When running in a containerized environment, redirects issued by the host browser to the localhost:{port} callback URL:

  1. Frequently hang and get stuck indefinitely
  2. Occasionally error with a message that IDP authentication may have failed.

I've tried many approaches to make this handling more robust and combining a few together allows the code to work very reliably for my team. These issues were seen on macOS running a devcontainer orchestrated via VS Code and also on windows running WSL.

Attempted Approaches

1. SO_REUSE_ADDR and SO_REUSEPORT flags

I tried @phil-hawkins of implementing SO_REUSE_ADDR and I also experimented with the similar SO_REUSE_PORT flag. I may have seen an improvement when I did so, but it did not solve the issue for me and I continued to pursue other fixes.

Recommendation: Do not proceed with this fix

However if someone (@phil-hawkins?) can confirm this fixes any issues for them, I would be happy to finish coding it.

2. Prevent browsers fetching GET /favicon

One of the failures that would lead to an IDP auth failed message was due to the browsers occasionally sending GET /favicon requests from the Your identity was confirmed html page. I was able to prevent this behavior by adding this to the HTML body:
<link rel="icon" href="data:,">
I followed the approached outlined [here].(https://webdesign.tutsplus.com/prevent-automatic-favicon-requests--cms-34762t)
A favicon request could break the callback handling in a non-containerized environment as well, so this seems like a good idea for all cases.

Recommendation: Proceed with this fix

3. Multithreading

I experimented with spawning a thread to start the socket.accept() and socket.recv() logic before we send the auth request to the IDP. I even played with artificially sleeping before sending the IDP request. All of this was to no avail. It does not seem that the race condition can be fixed by allowing docker more time to map ports.

Recommendation: Do not proceed with this fix

4. Retry if zero byte buffer returns from recv

I noticed that frequently the result of the socket.recv() was being returned as zero bytes. Using wireshark on the HOST the GET /?token request seems perfectly fine from the HOST browser to the HOST port. I'm not sure why this occurs, but something in the way the container proxying for port-forwarding is frequently results in this behavior. If you simply do another socket.accept() and then a socket.recv in this scenario the second socket.recv call succeeds with the GET /?token results. This zero-length byte array behavior was observed on both WSL and macos vs code devcontainers. The behavior was the same regardless of whether a port was statically mapped and reused or not. My proposed fix for this issue is that in the event of a zero length byte array, simply retry the socket.accept() and socket.recv() and limit retries to 5 attempts.

Recommendation: Proceed with this fix

5. Do not block on socket.recv() (WSL fix)

In WSL an additional fix was needed. For some reason the call to socket.recv() very frequently blocks and never returns.

Investigating the flags available on socket.recv() revealed the MSG_DONTWAIT flag. If we pass this flag and the system determines it would block, the python socket library raises a BlockingIOError. Combined with the retry approach outlined above, this allowed WSL to consistently and reliably receive the callback. The idea is to catch the error and log, but let the retry logic attempt again since the bytearray is still set to zero bytes and the retry count has not been exceeded.

Recommendation: Proceed with this fix

Summary

I proposed the following fixes:

  1. Prevent browser from fetching favicon (<link rel="icon" href="data:,">)
  2. Simple retry logic to overcome intermittent issues coming from somewhere within the containerized port-forward/proxying stack
  3. Prevent hanging on recv by using MSG_DONTWAIT flag and retrying on error

@sfc-gh-aalam I can start working on this PR, but if you have any concerns or want to discuss please let me know.

@sfc-gh-aalam
Copy link
Collaborator

@podung Thanks for such a detailed explanation. I don't have any concerns. Please proceed with the PR.

@podung
Copy link
Contributor

podung commented Aug 7, 2023

@sfc-gh-aalam

Couple updates:

  1. I went ahead and added the original request by @phil-hawkins to optionally set the socket flag instructing the kernel to reuse ip address/ports. This may be useful for some and I did see a few times where my container took too long to dynamically open the port. Note: I chose to use the SO_REUSEPORT flag instead of SO_REUSEADDR because that seemed more appropriate based on my reading.

  2. Although the MSG_DONTWAIT flag seems to make WSL environments more stable, I was seeing it sometimes cause my other containerized environments (ubuntu on macos) fail more frequently. I would like to make this an opt-in feature as well.

Other than updating the Release Notes with the changes and outlining the environment variables and settings there, is there a place we keep canonical documentation that I should be updating?

@podung
Copy link
Contributor

podung commented Aug 7, 2023

Here's the branch I'm working on if you're interested in seeing what I have so far: main...podung:snowflake-connector-python:robust_containerized_externalbrowser

@podung
Copy link
Contributor

podung commented Aug 8, 2023

@sfc-gh-aalam PR submitted. I'm happy to change anything as you or any of the maintainers see fit, just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants