Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Nov 2, 2024

About

A followup on GH-691.

  • Adding software tests and a CI configuration, to verify the example program does not break.
  • Nightly runs will verify that the integration will always work, or signal when it breaks.
  • The outcome will be reflected on the Build Status page.
  • Dependabot will drive the project dependencies.

References

@amotl amotl requested a review from simonprickett November 2, 2024 15:55
@cla-bot cla-bot bot added the cla-signed label Nov 2, 2024
Comment on lines -75 to +84
CRATEDB_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true"
CRATEDB_SQLALCHEMY_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRATEDB_SQLALCHEMY_URL is the designated environment variable name for the SQLAlchemy connection string, contrary to CRATEDB_HTTP_URL, which is suitable for the Python DB API or crash, for example.

In order to not use different environment variables on our tutorials and educational material, let's adjust this to adhere to that convention.

Comment on lines +1 to +23
CREATE TABLE IF NOT EXISTS time_series_data (
timestamp TIMESTAMP,
value DOUBLE,
location STRING,
sensor_id INT
);

INSERT INTO time_series_data (timestamp, value, location, sensor_id)
VALUES
('2023-09-14T00:00:00', 10.5, 'Sensor A', 1),
('2023-09-14T01:00:00', 15.2, 'Sensor A', 1),
('2023-09-14T02:00:00', 18.9, 'Sensor A', 1),
('2023-09-14T03:00:00', 12.7, 'Sensor B', 2),
('2023-09-14T04:00:00', 17.3, 'Sensor B', 2),
('2023-09-14T05:00:00', 20.1, 'Sensor B', 2),
('2023-09-14T06:00:00', 22.5, 'Sensor A', 1),
('2023-09-14T07:00:00', 18.3, 'Sensor A', 1),
('2023-09-14T08:00:00', 16.8, 'Sensor A', 1),
('2023-09-14T09:00:00', 14.6, 'Sensor B', 2),
('2023-09-14T10:00:00', 13.2, 'Sensor B', 2),
('2023-09-14T11:00:00', 11.7, 'Sensor B', 2);

REFRESH TABLE time_series_data;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new init.sql file could also be used within the README to be provisioned by using an incantation to crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this here as it lines up better with the format of the tutorial on Discourse.

Comment on lines 1 to 7
langchain-openai<0.3
llama-index<0.12
llama-index-embeddings-langchain<0.3
llama-index-embeddings-openai<0.3
llama-index-llms-azure-openai<0.3
llama-index-llms-openai<0.3
sqlalchemy-cratedb
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After cleaning up dependencies, those are apparently all we need to define.

Comment on lines 5 to 22
from cratedb_toolkit.io.sql import DatabaseAdapter

HERE = Path(__file__).parent


@pytest.fixture()
def cratedb() -> DatabaseAdapter:
return DatabaseAdapter(dburi="crate://crate@localhost:4200")


@pytest.fixture(scope="function", autouse=True)
def init_database(cratedb):
"""
Initialize database.
"""
cratedb.run_sql("DROP TABLE IF EXISTS time_series_data;")
cratedb.run_sql((HERE / "init.sql").read_text())
Copy link
Member Author

@amotl amotl Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonprickett: You have been looking at more high-level database interfaces to CrateDB, beyond what standard interfaces (DB API, xDBC, CLI) provide, right?

At least on the Python side, there are a few SDK-like layers in CTK, to be used in occasions like those, or others. They are intended to minimize lines of code needed to conduct certain repetitive procedures, by adding a few bells and whistles here and there. Up until now, the procedures are both originating from, and focusing on support for writing notebooks, demo snippets, and software tests.

Do you like that style of providing value-added details on top of the driver layer, i.e. does it resonate with your ideas in any way?

Copy link
Member Author

@amotl amotl Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from cratedb_toolkit.io.sql import DatabaseAdapter

CTK's DatabaseAdapter is effectively a little wrapper around a few fundamentals, also using SQLAlchemy, in this spirit very similar to the SQLDatabase wrappers of LangChain and LlamaIndex.

Copy link
Member Author

@amotl amotl Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that LangChains also bears another layer on top of its SQLDatabase wrapper, the SQLDatabaseLoader.

That one is also pretty generic, so it can be counted in on the enumeration of re-usable high-level components to interact with CrateDB, mostly powered by SQLAlchemy in one way or another.

It has been coming from those contributions, broken out of crate-workbench/langchain#1:

@amotl
Copy link
Member Author

amotl commented Nov 5, 2024

Do you agree with the changes here, @simonprickett, so we can toggle the badge on the Build Status page into green?

Copy link
Contributor

@simonprickett simonprickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this - the tutorial on Discourse will need updating once this has been merged. I'll take that on.

@simonprickett simonprickett merged commit 7c163f2 into main Nov 6, 2024
3 checks passed
@simonprickett simonprickett deleted the ci-llama-index branch November 6, 2024 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants