Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1239684: Snowflake connector telemetry is toggled remotely and always sends internal data to Snowflake #1902

Closed
dbold opened this issue Mar 15, 2024 · 7 comments
Assignees
Labels
bug status-triage_done Initial triage done, will be further handled by the driver team

Comments

@dbold
Copy link

dbold commented Mar 15, 2024

Python version

Python 3.11.6 (main, Jan 9 2024, 11:01:12) [GCC 11.4.0]

Operating system and processor architecture

Linux-5.15.0-100-generic-x86_64-with-glibc2.35

Installed packages

aiohttp==3.9.3
aiosignal==1.3.1
asn1crypto==1.5.1
attrs==23.2.0
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
cryptography==42.0.5
filelock==3.13.1
frozenlist==1.4.1
greenlet==3.0.3
idna==3.6
influxdb-client==1.39.0
multidict==6.0.5
numpy==1.26.4
packaging==23.2
pandas==2.2.1
platformdirs==3.11.0
pycparser==2.21
PyJWT==2.8.0
pyOpenSSL==24.0.0
python-dateutil==2.8.2
pytz==2024.1
reactivex==4.0.4
requests==2.31.0
six==1.16.0
snowflake-connector-python==3.7.1
snowflake-sqlalchemy==1.5.1
sortedcontainers==2.4.0
SQLAlchemy==1.4.51
tomlkit==0.12.4
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.1.0
yarl==1.9.4

What did you do?

It looked from our audits as if the Snowflake Connector starts with a disabled telemetry and there is a way to progranmatically toggle it, if one so desires.

To our great surprise the Snowflake Connector exfiltrates data and does send telemetry no matter what.

This happens because the telemetry parameter is enabled remotely by Snowflake server.

This is done early, during the authentication: the session_parameters are updated based on the server response https://github.com/snowflakedb/snowflake-connector-python/blob/main/src/snowflake/connector/auth/_auth.py#L470

The server response contains among others in data.parameters the telemetry keys:

{'name': 'CLIENT_TELEMETRY_ENABLED', 'value': True},
{'name': 'CLIENT_TELEMETRY_SESSIONLESS_ENABLED', 'value': True},

With the updates session_parameters, Auth calls self._rest._connection._update_parameters(session_parameters) which will update telemetry_enabled = True one the connection.

Furthermore, since the connection calls _log_telemetry_imported_packages which means at least a log even (with all the packages) is saved (in the buffer) even before the connection is done.

Interestingly, the list of imported packages is a rather intrusive log to send.

And, at the end, just closing the client will flush telemetry and send the data externally.

Example:

import snowflake.connector

with snowflake.connector.connect(
    user='x',
    password='y',
    account='z',
    warehouse='i',
    database='d',
    validate_default_parameters=True
    ) as c:
    c.telemetry_enabled = False

At the end of this short program we see telemetry got enabled (through the server reply) and that data was sent.

2024-03-15 22:59:19,823 - MainThread connection.py:734 - close() - INFO - closed
2024-03-15 22:59:19,823 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-03-15 22:59:19,823 - MainThread telemetry.py:176 - send_batch() - DEBUG - Sending 1 logs to telemetry. Data is {'logs': [{'message': {'driver_type': 'PythonConnector', 'driver_version': '3.7.1', 'source': 'PythonConnector', 'type': 'client_imported_packages', 'value':...

What did you expect to see?

We did not expect to see CLIENT_TELEMETRY_ENABLED being set based on a server reply.

If this is a user settings, we would like to see where to configure it for our account.

But, as a library, it makes little sense for the Snowflake connector to just take all session_parameters as-is from the server reply.

The telemetry parameters should be explicitly excluded.

Fundamentally, there should be an implicit or easy way for no telemetry to be ever sent.

Can you set logging to DEBUG and collect the logs?

Relevant logs:

2024-03-15 22:59:19,823 - MainThread connection.py:734 - close() - INFO - closed
2024-03-15 22:59:19,823 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-03-15 22:59:19,823 - MainThread telemetry.py:176 - send_batch() - DEBUG - Sending 1 logs to telemetry. Data is {'logs': [{'message': {'driver_type': 'PythonConnector', 'driver_version': '3.7.1', 'source': 'PythonConnector', 'type': 'client_imported_packages', 'value
@github-actions github-actions bot changed the title Snowflake connector telemetry is toggled remotely and always sends internal data to Snowflake SNOW-1239684: Snowflake connector telemetry is toggled remotely and always sends internal data to Snowflake Mar 15, 2024
@sfc-gh-ashahi sfc-gh-ashahi self-assigned this Mar 20, 2024
@sfc-gh-ashahi sfc-gh-ashahi added the status-triage Issue is under initial triage label Mar 26, 2024
@sfc-gh-yixie
Copy link
Collaborator

@dbold could you try this?

from snowflake.connector.telemetry_oob import TelemetryService
conn = snowflake.connector.connect(**CONNECTION_PARAMETERS)
# disable in-band telemetry
conn.telemetry_enabled = False
# disable out-of-band telemetry
TelemetryService.get_instance().disable()

@dbold
Copy link
Author

dbold commented Apr 3, 2024

This does not seem to work:

with snowflake.connector.connect(...) as c:
    c.telemetry_enabled = False

    from snowflake.connector.telemetry_oob import TelemetryService
    # disable out-of-band telemetry
    TelemetryService.get_instance().disable()

as the logs show telemetry is sent:

2024-04-03 08:05:51,308 - MainThread connection.py:734 - close() - INFO - closed
2024-04-03 08:05:51,308 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-04-03 08:05:51,309 - MainThread telemetry.py:176 - send_batch() - DEBUG - Sending 1 logs to telemetry. Data is {'logs': [{'message': {'driver_type': 'PythonConnector', 'driver_version': '3.7.1', 'source': 'PythonConnector', 'type': 'client_imported_packages', 'value': "{'ntpath', 'random', 'ctypes', 'itertools', 'quopri', 'asn1crypto', 'opcode', 'builtins', 'hashlib', 'logging', 'certifi', 'platform', 'inspect', 'enum'....

An attempt to disable telemetry we explored is just changing the telemetry URL:

# Try to break the telemetry client with a wrong url. It will auto-disable itself after sending the 1st packet and failing.
snowflake.connector.telemetry.TelemetryClient.SF_PATH_TELEMETRY = "/please-stop/sending"

But this seems to cause some other problems and I'll probably open a separate issue.

@sfc-gh-yixie
Copy link
Collaborator

@dbold We're reviewing what's next for telemetry. Will update you later.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Apr 3, 2024
@craigls
Copy link

craigls commented Aug 9, 2024

Is there any update for this issue?

@dougdragon
Copy link

I kept getting warnings in my log from telemetry: WARNING:snowflake.connector.telemetry:Failed to add log to telemetry. and I couldn't find a way to disable telemetry when instantiating the connector client.

It's a little hacky but I ended up setting telemetry_enabled to False overriding the default value in the SnowflakeConnection class' _update_parameters() function. I no longer see the warning/error.

@sfc-gh-aling
Copy link
Collaborator

we have fixed the bug of disabling telemetry in the latest release 3.12.1 and included a section on how to disable telemetry in our readme: https://github.com/snowflakedb/snowflake-connector-python?tab=readme-ov-file#disable-telemetry.

please try out the latest version, thanks!

@dbold
Copy link
Author

dbold commented Aug 21, 2024

For reference, the code changes seem to be #2013

We will check the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

7 participants