Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI for agent example within AI cookbook #28

Merged
merged 7 commits into from
Sep 26, 2024
Merged

Conversation

smurching
Copy link
Collaborator

@smurching smurching commented Sep 26, 2024

What changes are proposed in this pull request?

Adds GitHub actions with CI (which runs unit tests) for the AI cookbook, plus a few example unit tests

How is this PR tested?

  • Manually ran unit tests
  • Via CI on this PR

I'll send follow-up PRs after this to apply lint to cookbook code and also add more unit tests to the file_loading module touched in this PR

Signed-off-by: Sid Murching <sid.murching@databricks.com>
Signed-off-by: Sid Murching <sid.murching@databricks.com>
Signed-off-by: Sid Murching <sid.murching@databricks.com>
Signed-off-by: Sid Murching <sid.murching@databricks.com>
@@ -42,10 +42,6 @@

# COMMAND ----------

# MAGIC %run ./utils/install_aptget_package
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook doesn't exist but also seems unnecessary to run

@@ -1,19 +1,3 @@
# Databricks notebook source
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't import from notebooks - need to make this a normal python file by deleting the "Databricks notebook source" header here



def parse_and_extract(
def _parse_and_extract(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making stuff private if it's not used outside this file


def load_uc_volume_to_delta_table(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This util is hard to test because it does so much, it's easier to break this down into dataframe operations (load files into DF, apply parsing, etc) and test those

@@ -3,11 +3,11 @@
To start working on this book:
- clone the repo; `cd cookbook`
- use your preferred approach to starting a new python environment
- in that environment, `pip install jupyter-book`
- in that environment, `pip install -r dev/dev_requirements.txt`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding dev requirements needed to run tests, lint, etc

Signed-off-by: Sid Murching <sid.murching@databricks.com>
Signed-off-by: Sid Murching <sid.murching@databricks.com>
Signed-off-by: Sid Murching <sid.murching@databricks.com>
@@ -127,7 +121,7 @@ def load_uc_volume_to_delta_table(
display_markdown(
f"### {num_errors} documents had parse errors. Please review.", raw=True
)
display(errors_df)
Copy link
Collaborator Author

@smurching smurching Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

display() doesn't work in unit tests, asked folks about it in Slack

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats unfortunate. if we're not writing tests that actually look at what is displayed, can we just monkeypatch this out?

@smurching smurching changed the title [WIP] Add CI for agent example within AI cookbook Add CI for agent example within AI cookbook Sep 26, 2024
Copy link
Collaborator

@prithvikannan prithvikannan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will followup on slack about the util notebook -> util python code change.

return (
pyspark.sql.SparkSession.builder
.master("local[1]")
# Uncomment the following line for testing on Apple silicon locally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create some control flag such that we can run pytest locally (on apple silicon) and get this to work and then CI can have the flag the other way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - actually, looks like CI seems to work with this stuff enabled by default too? So will just remove the comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to go in this direction, should we make this change for all the util and validator notebooks?

Copy link
Collaborator Author

@smurching smurching Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should - to your point let me also follow up with some links to docs (e.g. https://docs.databricks.com/en/files/workspace-modules.html#work-with-python-and-r-modules) on how to iterate on these modules etc

@@ -127,7 +121,7 @@ def load_uc_volume_to_delta_table(
display_markdown(
f"### {num_errors} documents had parse errors. Please review.", raw=True
)
display(errors_df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats unfortunate. if we're not writing tests that actually look at what is displayed, can we just monkeypatch this out?

@smurching smurching merged commit 9b85862 into main Sep 26, 2024
2 checks passed
@smurching smurching deleted the smurching/add-ci branch October 8, 2024 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants