-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Trino data reader #20127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Trino data reader #20127
Conversation
AstraBert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but in order to get this PR approved you should change the location of the scripts as detailed in the first comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want the package to be published correctly, the scripts should be moved under llama_index/readers/trino
| "schema": schema, | ||
| } | ||
|
|
||
| def configureConnection(self) -> Tuple[trino.dbapi.Connection, trino.dbapi.Cursor]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, better to use snake_case
| cur.execute(query) | ||
|
|
||
| rows = cur.fetchall() | ||
| return [rows, cur.description] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use the cursor and connection available within the class?
| # CRITICAL: Add the native Trino Python client as a dependency | ||
| # Note: We use the llama-index-core version from the source template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to commit these comments? 😅
| exclude = ["_static", "build", "examples", "notebooks", "venv"] | ||
| ignore_missing_imports = true | ||
| python_version = "3.8" | ||
| python_version = "3.9" # Updated to align with your project's min Python version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using 3.10 since 3.9 has reached its EOF at the beginning of October... Also, I would get rid of the comment :)
AstraBert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change requested as detailed below 👇
llama-index-integrations/readers/llama-index-readers-trino/llama_index/readers/trino/base.py
Outdated
Show resolved
Hide resolved
AstraBert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should change the test import pattern, otherwise tests will fail
| from unittest import mock | ||
| from llama_index.core.readers.base import BaseReader | ||
| import inspect | ||
| from llama_index.base import TrinoReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have overseen this, but this module does not exist. If you want to import TrinoReader, you should do: from llama_index.readers.trino.base import TrinoReader. This is causing test failures
| from llama_index.base import TrinoReader | |
| from llama_index.readers.trino.base import TrinoReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for that. fixed it and looked over the feature to make sure everything else made sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be __init__.py. This is breaking package building
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sorry about that fixed it!
Description
This PR introduces a new TrinoReader (Data Loader) for the LlamaINdex ecosystem, built around the native trino-python-client. This implementation is designed. to solve common production instability issues when querying Trino's distributed sql engine
Fixes #20126 20126
New Package?
Yes
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
New feature (non-breaking change which adds functionality)
This change requires a documentation update
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran
uv run make format; uv run make lintto appease the lint gods