Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ RUN mkdir ~/.aws ~/.gen3 /root/studies

RUN git clone https://github.com/bmeg/iceberg.git && \
cd iceberg && \
git checkout feature/FHIR-resource-type
git checkout 7f6cfdb558d05370fc645b5ab894b98b38a01e1b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just main or the most recent commit in iceberg? If there's a tendency to not do main then we should have a development branch or something in iceberg to pin it to

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the server is setup to use a very specific version of iceberg. See gen3-helm/helm/grip/templates/post-something.yaml

I want to match what I know the server is using.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


COPY . /root

#Add jsonschemagraph exe to image
RUN wget https://github.com/bmeg/jsonschemagraph/releases/download/v0.0.2/jsonschemagraph-linux.amd64 -P /usr/local/bin/
RUN wget https://github.com/bmeg/jsonschemagraph/releases/download/v0.0.3/jsonschemagraph-linux.amd64 -P /usr/local/bin/
RUN mv /usr/local/bin/jsonschemagraph-linux.amd64 /usr/local/bin/jsonschemagraph
RUN chmod +x /usr/local/bin/jsonschemagraph
ENV PATH="/usr/local/bin:$PATH"
Expand Down
29 changes: 23 additions & 6 deletions bundle_service/processing/process_bundle.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ async def _can_create(access_token: str, project_id: str) -> bool | str | int:
return True, f"HAS SERVICE create on resource {required_service}", None


async def process(rows: List[dict], project_id: str, access_token: str) -> list[str]:
async def process(rows: List[dict], project_id: str, access_token: str) -> List[str] | None:
"""Processes a bundle into a temp directory of NDJSON files
that are compatible with existing loading functions

Expand Down Expand Up @@ -109,14 +109,31 @@ async def process(rows: List[dict], project_id: str, access_token: str) -> list[
temp_file.close()

if files_written:
subprocess.run(["jsonschemagraph", "gen-dir", "iceberg/schemas/graph", f"{temp_dir}", f"{temp_dir}/OUT", "--project_id", f"{project_id}", "--gzip_files"])
res = bulk_load(await _get_grip_service_name(), await _get_grip_graph_name(), project_id, f"{temp_dir}/OUT", logs, access_token)
if int(res[0]["status"]) != 200:
server_errors.append(res[0]["message"])
program, project = project_id.split("-")
project_str_dict = f'{{"auth_resource_path":"/programs/{program}/projects/{project}"}}'
print(f"Using project: {project_str_dict}")
result = subprocess.run(
["jsonschemagraph", "gen-dir", "iceberg/schemas/graph", f"{temp_dir}", f"{temp_dir}/OUT", "--extraArgs", project_str_dict, "--gzip_files"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to build jsonschemagraph from main branch but when building jsonschemagraph from source, running into...

graphql/griptographql.go:29:8: v.Gid undefined (type *gripql.Vertex has no field or method Gid)

feature/update-grip-structs branch at least compiled and had the --extraArgs flag, how to proceed?

Copy link
Collaborator Author

@matthewpeterkort matthewpeterkort May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's because grip was using a shadow branch. See:

bmeg/jsonschemagraph#16 try with go install github.com/bmeg/jsonschemagraph@b88905d8d65d858be9b52e04632dde309f033a3e

which is referencing the version from bmeg/jsonschemagraph@b88905d

But for this PR I believe it's using a executable from this release https://github.com/bmeg/jsonschemagraph/releases/tag/v0.0.3 if you check the Dockerfile

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so to confirm: all dockerfile dependencies point to merged branches for iceberg and jsonschema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot about this repo. Had to make a new PR now that you mentioned it. bmeg/jsonschemagraph#16. Since it's working, if you want to review it you can, but not going to make any changes to it that impact this release since it's working.

Copy link

@quinnwai quinnwai May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eems like there's at least one other dependency github.com/bmeg/grip v0.0.0-20250421161012-b9b392fc8721 that is from an open PR.

Since it's statically compiled in Go, I'm less worried about updates to the PR causing breaking changes but I want us to be more stringent in future releases: all dependencies should point to merged code if it's a feature in the release.

I'm good to proceed if we start enforcing this rule in future releases. How do I test these changes?

Copy link
Collaborator Author

@matthewpeterkort matthewpeterkort May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grip is using this jsonschemagraph directly as a package. if jsonschemagraph doesn't work you will know it in downstream code repositories, and ultimately in the g3t end to end test

Kyle had approved this PR in grip, that requires this dependency. You test these changes using the g3t end to end test like what is specified int he release notes

capture_output=True,
text=True,
check=False
)
if result.returncode == 0:
print("jsonschemagraph ran successfully.")
res = bulk_load(await _get_grip_service_name(), await _get_grip_graph_name(), project_id, f"{temp_dir}/OUT", logs, access_token)
if int(res[0]["status"]) != 200:
server_errors.append(res[0]["message"])
else:
print(f"jsonschemagraph failed with exit code: {result.returncode}")
print("Stdout:")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirming, the stdout gets logged to the FHIR server pod logs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any error logs will come as a https response message. Any more in depth logs can be checked by checking the fhir server logs in k8s.

print(result.stdout)
print("Stderr:")
print(result.stderr)
server_errors.append(f"jsonschemagraph failed: {result.stderr.strip()}")

try:
db = LocalFHIRDatabase(db_name=f"{temp_dir}/local_fhir.db")
db.bulk_insert_data(resources=get_project_data(await _get_grip_service_name(), await _get_grip_graph_name(), project_id, logs, access_token))
db.bulk_insert_data(resources=get_project_data(await _get_grip_service_name(), await _get_grip_graph_name(), project_id, logs, access_token, 1024*1024))

index_generator_dict = {
'researchsubject': db.flattened_research_subjects,
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ fastapi
uvicorn[standard]

#aced submission
aced-submission==0.0.9rc37
aced-submission==0.0.10rc11

#gen3 tracker
gen3-tracker==0.0.5rc11
gen3-tracker==0.0.7rc13
4 changes: 2 additions & 2 deletions tests/server/test_bundle_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,13 +227,13 @@ def test_write_bundle_simple_ok(valid_bundle, valid_patient):
vertex_id = request_bundle["entry"][0]["resource"]["id"]
project_id = request_bundle["identifier"]["value"]
endpoint = endpoint_from_token(ACCESS_TOKEN)
result = requests.get(f"{endpoint}/grip/writer/graphql/CALIPER/get-vertex/{vertex_id}/{project_id}",
result = requests.get(f"{endpoint}/grip/writer/CALIPER/get-vertex/{vertex_id}/{project_id}",
headers=HEADERS
).json()

print("RESULT: ", result)
print("ENTRY: ", request_bundle["entry"][0]["resource"])
assert result['data']['gid'] == vertex_id
assert result['data']['id'] == vertex_id


def test_write_bundle_missing_type(valid_bundle, valid_patient):
Expand Down