-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-502111: fetch_pandas_all returns ValueError: Found non-unique column index #970
Comments
Ok, I can actually recreate the problem now that I understand it, see below.
Is there a way we can avoid an error being thrown as this is in fact valid sql. For instance, can we just delete duplicated columns and issue a warning that some columns were redundant and couldn't be handled by pyarrow? |
While the SQL is valid, it looks like I'd be against issuing warnings and de-duping columns, as this could lead to some users losing information (unknowingly, if they have warnings disabled). |
The user wouldn't loose any information in that case; it would just throw a runtime error and fail. What about enhancing the error so that there is more context? i.e catching |
Hi, Is there anyone working on this issue? I would like to further discuss on this and contribute too. |
This issue comes up when you |
@shacker I had the same error with |
Any plans to solve this issue in the future? It's really annoying to deal with it when working with tables that have a lot of columns |
hey folks, thanks for your feedbacks. I'm thinking of two things:
I think we need more information on whether it will lead to real data loss, do you have samples to share us with on how your table is defined and what detailed select statement you write? on the other side, the 1st item is definitely a low-hanging fruit we can get it done. Is there any of you interested in enhancing the error information? |
To provide some context, as it's been a long time since I submitted this issue, I was encountering this prior to Snowpark being released. Now with snowpark it seems like this is either much less likely to come up just from how the API works, or helpfully warning me if there is a conflict. My vote would be to not make any changes if we are not 100% sure this wouldn't lead to data loss and instead let the connector just do one thing and do that one thing really well. |
Based on @mbkupfer comments, I am closing this issue for now. If we have more concerns in the future, please consider the above comments by @sfc-gh-aling and @sfc-gh-mkeller. Happy to listen to any other ideas/suggestions on how to proceed with this issue. |
I just hit this issue. @mbkupfer, how does Snowpark mitigate it? My preference would be a new optional flag that resolves column name conflicts by appending '_1', '_2', etc. Then there's no data loss. In many cases, applications are just showing the results to a user who can deal with the ambiguity. |
I'd have to see your use case to understand/advise. Happy to bring this offline from this thread if you want to just DM personally. |
Was there any resolution here to @breckognize proposal to add That strikes me as the most logical solution and also matches the behavior of the SDK for other competitors like BQ. |
Please answer these questions before submitting your issue. Thanks!
Python 3.8.10
Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
pip freeze
)?anyio==3.3.4
argon2-cffi==21.1.0
asn1crypto==1.4.0
astroid==2.8.0
attrs==19.3.0
Automat==0.8.0
azure-common==1.1.27
azure-core==1.15.0
azure-storage-blob==12.8.1
Babel==2.9.1
backcall==0.2.0
black==21.9b0
bleach==4.0.0
blinker==1.4
boto3==1.17.98
botocore==1.20.98
certifi==2021.5.30
cffi==1.14.5
chardet==4.0.0
charset-normalizer==2.0.7
click==8.0.1
cloud-init==21.3
cloudpickle==2.0.0
colorama==0.4.3
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==3.3.2
cycler==0.10.0
dask==2021.10.0
dbus-python==1.2.16
debugpy==1.5.1
decorator==5.0.9
defusedxml==0.7.1
distributed==2021.10.0
distro==1.4.0
distro-info===0.23ubuntu1
entrypoints==0.3
et-xmlfile==1.1.0
fsspec==2021.10.1
greenlet==1.1.1
HeapDict==1.0.1
httplib2==0.14.0
hyperlink==19.0.0
idna==2.10
importlib-metadata==1.5.0
incremental==16.10.1
ipykernel==6.4.2
ipython==7.26.0
ipython-genutils==0.2.0
isodate==0.6.0
isort==5.9.3
jedi==0.18.0
Jinja2==2.10.1
jmespath==0.10.0
joblib==1.1.0
json5==0.9.6
jsonpatch==1.22
jsonpointer==2.0
jsonschema==3.2.0
jupyter-client==7.0.0
jupyter-core==4.7.1
jupyter-server==1.11.1
jupyterlab==3.2.1
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2
keyring==18.0.1
kiwisolver==1.3.2
language-selector==0.1
launchpadlib==1.10.13
lazr.restfulclient==0.14.2
lazr.uri==1.0.3
lazy-object-proxy==1.6.0
locket==0.2.1
MarkupSafe==1.1.0
matplotlib==3.4.3
matplotlib-inline==0.1.2
mccabe==0.6.1
mistune==0.8.4
more-itertools==4.2.0
msgpack==1.0.2
msrest==0.6.21
mypy-extensions==0.4.3
nbclassic==0.3.3
nbclient==0.5.4
nbconvert==6.1.0
nbformat==5.1.3
nest-asyncio==1.5.1
netifaces==0.10.4
notebook==6.4.5
numpy==1.21.1
oauthlib==3.1.1
openpyxl==3.0.7
oscrypto==1.2.1
packaging==21.0
pandas==1.3.1
pandocfilters==1.4.3
parso==0.8.2
partd==1.2.0
pathspec==0.9.0
pexpect==4.6.0
pickleshare==0.7.5
Pillow==8.4.0
platformdirs==2.3.0
prometheus-client==0.11.0
prompt-toolkit==3.0.19
psutil==5.8.0
ptyprocess==0.7.0
pyarrow==5.0.0
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycparser==2.20
pycryptodomex==3.10.1
Pygments==2.9.0
PyGObject==3.36.0
PyHamcrest==1.9.0
PyJWT==2.1.0
pylint==2.11.1
pylint-venv==2.1.1
pymacaroons==0.13.0
PyNaCl==1.3.0
pyOpenSSL==21.0.0
pyparsing==2.4.7
pyrsistent==0.15.5
pyserial==3.4
python-apt==2.0.0+ubuntu0.20.4.6
python-dateutil==2.8.1
python-debian===0.1.36ubuntu1
pytz==2021.1
PyYAML==5.3.1
pyzmq==22.2.1
regex==2021.8.28
requests==2.25.1
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
s3transfer==0.4.2
scikit-learn==1.0.1
scipy==1.7.1
seaborn==0.11.2
SecretStorage==2.3.1
Send2Trash==1.8.0
service-identity==18.1.0
simplejson==3.16.0
six==1.16.0
sniffio==1.2.0
snowflake-connector-python==2.7.0
snowflake-sqlalchemy==1.3.2
sortedcontainers==2.4.0
sos==4.1
SQLAlchemy==1.4.22
ssh-import-id==5.10
systemd-python==234
tblib==1.7.0
terminado==0.12.1
testpath==0.5.0
threadpoolctl==3.0.0
toml==0.10.2
tomli==1.2.1
toolz==0.11.1
tornado==6.1
traitlets==5.0.5
Twisted==18.9.0
typing-extensions==3.10.0.2
ubuntu-advantage-tools==27.2
ufw==0.36
unattended-upgrades==0.1
urllib3==1.26.5
wadllib==1.3.3
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.1
wrapt==1.12.1
zict==2.0.0
zipp==1.0.0
zope.interface==4.7.1
So sorry, but I'm working with proprietary data and really have no idea how to recreate using dummy data.
Should of gotten a dataframe
I tried this and it looked like a lot of identifiable information was in there. Happy to DM, but I don't want to make those logs public.
The text was updated successfully, but these errors were encountered: