Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: multiple python version support with latest pyspark and hail #974

Merged
merged 47 commits into from
Jan 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
90e6028
chore(pyspark): update to 3.5.X
SzymonSzyszkowski Jan 17, 2025
630c0c9
chore: fix doctest syntax
SzymonSzyszkowski Jan 17, 2025
1dbe1b0
chore: bump temurin version to 11
SzymonSzyszkowski Jan 17, 2025
bcf0b9a
feat: allow multiple python versions
Jan 21, 2025
4d3380a
feat: python matrix for gha
Jan 21, 2025
28b3e2c
chore: pre-commit auto fixes [...]
pre-commit-ci[bot] Jan 21, 2025
9cc2c78
Merge branch 'dev' into pyspark-bump
project-defiant Jan 21, 2025
fbaa8d9
chore: typos
Jan 21, 2025
7f416ed
chore: fix python version in setup dev script
Jan 21, 2025
5a9cd8f
fix: attempt to fix the 3.11 tests
Jan 21, 2025
c46cdab
fix: set the session correctly in variant_index_config
Jan 21, 2025
18c66b1
Revert "chore: fix doctest syntax"
Jan 21, 2025
def0fbb
chore: update dependencies
Jan 21, 2025
4eabc51
Revert "Revert "chore: fix doctest syntax""
Jan 21, 2025
c350211
chore: bump image to 2.2
Jan 21, 2025
8c2fa2b
chore: update lock files
Jan 21, 2025
1719b5c
build: poetry cleanup
Jan 22, 2025
c45ac1c
build: uv checks droped
Jan 22, 2025
1fccce6
chore: fix dockerfile and install test deps
Jan 22, 2025
4c1efbd
build(uv): add all dependencies to run tests
Jan 22, 2025
1e913a8
Merge branch 'dev' into pyspark-bump
project-defiant Jan 22, 2025
08e03e3
chore: fix test issue with rounding error
Jan 22, 2025
f9fc356
chore: fix dependency version lower bounds
Jan 22, 2025
f1ff1f9
chore: add .python-version file to ignored
Jan 22, 2025
98d464d
build: new setup
Jan 23, 2025
aa64db9
build: new setup
Jan 23, 2025
570e33e
build: new setup
Jan 23, 2025
89a9c34
build: new setup
Jan 23, 2025
2db0610
build: new setup
Jan 23, 2025
b392368
revert: bring back initialization actions
Jan 23, 2025
2329a8a
chore: align variable name
Jan 23, 2025
04c2ed2
chore: update pre-commit python version
Jan 23, 2025
979325d
chore: docs update
Jan 23, 2025
3eb7d55
Merge branch 'dev' into pyspark-bump
project-defiant Jan 24, 2025
79022b9
feat: more complex uv installation
SzymonSzyszkowski Jan 27, 2025
9adc76a
feat: notify to source shellrc file when installing uv
SzymonSzyszkowski Jan 27, 2025
7d10b63
fix: checks
SzymonSzyszkowski Jan 27, 2025
a17290d
chore: debug gha
SzymonSzyszkowski Jan 27, 2025
79de16b
chore: debug gha
SzymonSzyszkowski Jan 27, 2025
e7d5cd8
feat: debug gha
SzymonSzyszkowski Jan 27, 2025
f4ab0d0
feat: debug gha
SzymonSzyszkowski Jan 27, 2025
3bc1318
feat: debug gha
SzymonSzyszkowski Jan 27, 2025
a82b86c
feat: force user shell
SzymonSzyszkowski Jan 27, 2025
09b83a8
feat: gha debug
SzymonSzyszkowski Jan 27, 2025
640f493
feat: gha debug
SzymonSzyszkowski Jan 27, 2025
4a2018a
feat: gha debug
SzymonSzyszkowski Jan 27, 2025
e3ea829
feat: gha debug
SzymonSzyszkowski Jan 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@ version: 1
labels:
- label: "size-XS"
size:
exclude-files: ["poetry.lock"]
exclude-files: ["uv.lock"]
below: 10
- label: "size-S"
size:
exclude-files: ["poetry.lock"]
exclude-files: ["uv.lock"]
above: 9
below: 100
- label: "size-M"
size:
exclude-files: ["poetry.lock"]
exclude-files: ["uv.lock"]
above: 100
below: 500
- label: "size-L"
size:
exclude-files: ["poetry.lock"]
exclude-files: ["uv.lock"]
above: 499
below: 1000
- label: "size-XL"
size:
exclude-files: ["poetry.lock"]
exclude-files: ["uv.lock"]
above: 999
- label: "airflow"
files:
Expand Down
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ add diagrams or images if necessary. It'll help the reviewer_ -->
- [ ] Did you make sure the branch is up-to-date with the `dev` branch?
- [ ] Did you write any new necessary tests?
- [ ] Did you make sure the changes pass local tests (`make test`)?
- [ ] Did you make sure the changes pass pre-commit rules (e.g `poetry run pre-commit run --all-files`)?
- [ ] Did you make sure the changes pass pre-commit rules (e.g `uv run pre-commit run --all-files`)?
15 changes: 6 additions & 9 deletions .github/workflows/artifact.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: Build and Push to Artifact Registry

"on":
workflow_dispatch:
push:
branches:
- "*"
Expand All @@ -12,7 +13,7 @@ env:
REGION: europe-west1
GAR_LOCATION: europe-west1-docker.pkg.dev/open-targets-genetics-dev
REPOSITORY: gentropy-app
PYTHON_VERSION_DEFAULT: "3.10.8"
PYTHON_VERSION_DEFAULT: "3.11.11"

jobs:
build-push-artifact:
Expand Down Expand Up @@ -54,6 +55,7 @@ jobs:

# skip the `v` at the beginning of the tag for docker image tags
- name: Create a docker tag
if: github.ref == 'refs/heads/dev' || startsWith(github.ref, 'refs/tags/v')
id: docker-tag
shell: bash
env:
Expand Down Expand Up @@ -86,13 +88,8 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION_DEFAULT }}
- name: Install and configure Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Build and push spark cluster dependencies
run: |
make build
run: make build
34 changes: 15 additions & 19 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,47 +4,43 @@ name: Checks
pull_request:

env:
PYTHON_VERSION_DEFAULT: "3.10.8"
PYTHON_VERSION_DEFAULT: "3.11.11"

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
fail-fast: false
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.10.8
python-version: ${{ matrix.python-version }}
- name: Set up Java
uses: actions/setup-java@v4
with:
java-version: "8"
java-version: "11"
distribution: "temurin"
- name: Install and configure Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Load cached venv
id: cached-poetry-dependencies
id: cached-uv-dependencies
uses: actions/cache@v4
with:
path: .venv
key: venv-${{ runner.os }}-${{ env.PYTHON_VERSION_DEFAULT }}-${{ hashFiles('**/poetry.lock') }}
- name: Validate project dependencies
run: poetry check
key: venv-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/uv.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction --no-root
- name: Install library
run: poetry install --no-interaction
if: steps.cached-uv-dependencies.outputs.cache-hit != 'true'
run: uv sync --all-groups
- name: Check dependencies
run: poetry run deptry .
run: uv run deptry .
- name: Run tests
run: poetry run pytest
run: uv run pytest
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v5
with:
Expand Down
28 changes: 10 additions & 18 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ name: Release

"on":
push:
branches: ["main", "release/**"]
branches: ["main", "release/**", "dev"]

concurrency:
group: deploy
cancel-in-progress: false # prevent hickups with semantic-release

env:
PYTHON_VERSION_DEFAULT: "3.10.8"
PYTHON_VERSION_DEFAULT: "3.11.11"

jobs:
release:
Expand Down Expand Up @@ -40,9 +40,7 @@ jobs:

- name: Python Semantic Release
id: semrelease
# v9.6.0 is required due to the python v3.12 in the newer version of semantic release action which
# breaks the poetry build command.
uses: python-semantic-release/python-semantic-release@v9.6.0
uses: python-semantic-release/python-semantic-release@v9.16.1
with:
github_token: ${{ steps.trigger-token.outputs.token }}

Expand Down Expand Up @@ -121,25 +119,19 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION_DEFAULT }}
- name: Install and configure Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Load cached venv
id: cached-poetry-dependencies
id: cached-dependencies
uses: actions/cache@v4
with:
path: .venv
key: |
venv-${{ runner.os }}-\
${{ env.PYTHON_VERSION_DEFAULT }}-\
${{ hashFiles('**/poetry.lock') }}
${{ hashFiles('**/uv.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction --no-root
- name: Install library
run: poetry install --without tests --no-interaction
if: steps.cached-dependencies.outputs.cache-hit != 'true'
run: uv sync --group docs
- name: Publish docs
run: poetry run mkdocs gh-deploy --force
run: uv run mkdocs gh-deploy --force
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,5 @@ site/
.coverage*
wandb/
hail*.log
.python-version
.idea
11 changes: 4 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
default_language_version:
python: python3.10
python: python3.11
ci:
autoupdate_commit_msg: "chore: pre-commit autoupdate"
autofix_commit_msg: "chore: pre-commit auto fixes [...]"
skip: [poetry-lock]
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.7.4
Expand Down Expand Up @@ -101,9 +100,7 @@ repos:
rev: 0.5.9
hooks:
- id: pydoclint

- repo: https://github.com/python-poetry/poetry
rev: "2.0.0"
- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.5.22
hooks:
- id: poetry-check
- id: poetry-lock
- id: uv-lock
1 change: 0 additions & 1 deletion .python-version

This file was deleted.

25 changes: 9 additions & 16 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,27 +1,20 @@
FROM python:3.10-bullseye

RUN apt-get update \
&& apt-get clean \
&& apt-get install -y openjdk-11-jdk \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get clean && \
apt-get install -y openjdk-11-jdk && \
rm -rf /var/lib/apt/lists/*

ENV POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache \
JAVA_HOME=/usr
ENV JAVA_HOME=/usr

RUN pip install poetry>=2.0.0
RUN pip install uv
WORKDIR /app

COPY pyproject.toml poetry.lock ./
COPY pyproject.toml uv.lock ./
RUN touch README.md

RUN poetry config installer.max-workers 10
RUN poetry install --without dev,docs,tests --no-root --no-interaction --no-ansi -vvv && rm -rf $POETRY_CACHE_DIR
RUN uv sync

COPY src ./src

RUN poetry install --without dev,docs,tests

ENTRYPOINT ["poetry", "run", "gentropy"]
ENTRYPOINT ["uv", "run", "gentropy"]
55 changes: 31 additions & 24 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
SHELL := /bin/bash
PROJECT_ID ?= open-targets-genetics-dev
REGION ?= europe-west1
APP_NAME ?= $$(cat pyproject.toml | grep -m 1 "name" | cut -d" " -f3 | sed 's/"//g')
PACKAGE_VERSION ?= $$(poetry version --short)
PACKAGE_VERSION ?= $(shell grep -m 1 'version = ' pyproject.toml | sed 's/version = "\(.*\)"/\1/')
# NOTE: git rev-parse will always return the HEAD if it sits in the tag,
# this way we can distinguish the tag vs branch name
ifeq ($(shell git rev-parse --abbrev-ref HEAD),HEAD)
Expand All @@ -11,7 +12,7 @@ else
endif

CLEAN_PACKAGE_VERSION := $(shell echo "$(PACKAGE_VERSION)" | tr -cd '[:alnum:]')
BUCKET_NAME=gs://genetics_etl_python_playground/initialisation/${APP_NAME}/${REF}
BUCKET_NAME=gs://genetics_etl_python_playground/initialisation

.PHONY: $(shell sed -n -e '/^$$/ { n ; /^[^ .\#][^ ]*:/ { s/:.*$$// ; p ; } ; }' $(MAKEFILE_LIST))

Expand All @@ -23,43 +24,55 @@ help: ## This is help
clean: ## Clean up prior to building
@rm -Rf ./dist

setup-dev: SHELL:=/bin/bash
setup-dev: ## Setup development environment
setup-dev: SHELL := $(shell echo $${SHELL})
setup-dev: ## Setup development environment
@. utils/install_dependencies.sh
@echo "Run . ${HOME}/.$(notdir $(SHELL))rc to finish setup"

check: ## Lint and format code
@echo "Linting API..."
@poetry run ruff check src/gentropy .
@uv run ruff check src/gentropy .
@echo "Linting docstrings..."
@poetry run pydoclint --config=pyproject.toml src
@poetry run pydoclint --config=pyproject.toml --skip-checking-short-docstrings=true tests
@uv run pydoclint --config=pyproject.toml src
@uv run pydoclint --config=pyproject.toml --skip-checking-short-docstrings=true tests

test: ## Run tests
@echo "Running Tests..."
@poetry run pytest
@uv run pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test this yourself, but the current command goes idle without running any tests

Suggested change
@uv run pytest
@uv run pytest .

Copy link
Contributor Author

@project-defiant project-defiant Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests still go ok on my side, although it seems that the long lookup, not sure why the dot helps here, assuming that we already add the testpath(s) to the pytest options, the dot just overwrites it. While testing your solution I got following errors cache related:

import file mismatch:
imported module 'a_creating_spark_session' has this __file__ attribute:
  /home/mindos/Projects/OpenTargets/gentropy/docs/src_snippets/howto/python_api/a_creating_spark_session.py
which is not the same as the test file we want to collect:
  /home/mindos/Projects/OpenTargets/gentropy/site/src_snippets/howto/python_api/a_creating_spark_session.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

This seems to be due to the fact that I have previously generated the docs that contain some test duplicate.

On the note of test collection speeds:

(base)  mindos@mindos  ~/Projects/OpenTargets/gentropy   pyspark-bump ±  time uv run pytest --collect-only . 1> /dev/null
uv run pytest --collect-only . > /dev/null  11,81s user 2,56s system 115% cpu 12,390 total
(base)  mindos@mindos  ~/Projects/OpenTargets/gentropy   pyspark-bump ±  time uv run pytest --collect-only 1> /dev/null 
uv run pytest --collect-only > /dev/null  11,70s user 2,62s system 116% cpu 12,269 total

in the first run the site dir is removed


build-documentation: ## Create local server with documentation
@echo "Building Documentation..."
@poetry run mkdocs serve
@uv run mkdocs serve

create-dev-cluster: build ## Spin up a simple dataproc cluster with all dependencies for development purposes
sync-cluster-init-script: ## Synchronize the cluster inicialisation actions script to google cloud
@echo "Syncing install_dependencies_on_cluster.sh to ${BUCKET_NAME}"
@gcloud storage cp utils/install_dependencies_on_cluster.sh ${BUCKET_NAME}/install_dependencies_on_cluster.sh

sync-gentropy-cli-script: ## Synchronize the gentropy cli script
@echo "Syncing gentropy cli script to ${BUCKET_NAME}"
@gcloud storage cp src/gentropy/cli.py ${BUCKET_NAME}/cli.py

create-dev-cluster: sync-cluster-init-script sync-gentropy-cli-script ## Spin up a simple dataproc cluster with all dependencies for development purposes
@echo "Making sure the branch is in sync with remote, so cluster can install gentropy dev version..."
@./utils/clean_status.sh || (echo "ERROR: Commit and push or stash local changes, to have up to date cluster"; exit 1)
@echo "Creating Dataproc Dev Cluster"
@gcloud config set project ${PROJECT_ID}
@gcloud dataproc clusters create "ot-genetics-dev-${CLEAN_PACKAGE_VERSION}-$(USER)" \
--image-version 2.1 \
gcloud config set project ${PROJECT_ID}
gcloud dataproc clusters create "ot-genetics-dev-${CLEAN_PACKAGE_VERSION}-$(USER)" \
--image-version 2.2 \
--region ${REGION} \
--master-machine-type n1-standard-16 \
--initialization-actions=$(BUCKET_NAME)/install_dependencies_on_cluster.sh \
--metadata="PACKAGE=$(BUCKET_NAME)/${APP_NAME}-${PACKAGE_VERSION}-py3-none-any.whl" \
--master-machine-type n1-standard-2 \
--metadata="GENTROPY_REF=${REF}" \
--initialization-actions=${BUCKET_NAME}/install_dependencies_on_cluster.sh \
--secondary-worker-type spot \
--worker-machine-type n1-standard-4 \
--public-ip-address \
--worker-boot-disk-size 500 \
--autoscaling-policy="projects/${PROJECT_ID}/regions/${REGION}/autoscalingPolicies/otg-etl" \
--optional-components=JUPYTER \
--enable-component-gateway \
--max-idle=60m

make update-dev-cluster: build ## Reinstalls the package on the dev-cluster
update-dev-cluster: build ## Reinstalls the package on the dev-cluster
@echo "Updating Dataproc Dev Cluster"
@gcloud config set project ${PROJECT_ID}
gcloud dataproc jobs submit pig --cluster="ot-genetics-dev-${CLEAN_PACKAGE_VERSION}" \
Expand All @@ -68,10 +81,4 @@ make update-dev-cluster: build ## Reinstalls the package on the dev-cluster
-e='sh chmod 750 $${PWD}/install_dependencies_on_cluster.sh; sh $${PWD}/install_dependencies_on_cluster.sh'

build: clean ## Build Python package with dependencies
@gcloud config set project ${PROJECT_ID}
@echo "Packaging Code and Dependencies for ${APP_NAME}-${PACKAGE_VERSION}"
@poetry build
@echo "Uploading to ${BUCKET_NAME}"
@gsutil cp src/${APP_NAME}/cli.py ${BUCKET_NAME}/
@gsutil cp ./dist/${APP_NAME}-${PACKAGE_VERSION}-py3-none-any.whl ${BUCKET_NAME}/
@gsutil cp ./utils/install_dependencies_on_cluster.sh ${BUCKET_NAME}/
@uv build
Loading
Loading