Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
214b141
Update contributing guidelines for CLI and Helm
lbeckman314 Sep 12, 2025
3f1db20
Apply suggestion from @Copilot
lbeckman314 Sep 12, 2025
c1cc3cd
Apply suggestion from @Copilot
lbeckman314 Sep 12, 2025
4744e0b
docs: Add deployment Helm url to README.md
lbeckman314 Oct 1, 2025
8473b3c
fix: Remove `limit` parameter from Grip functions
lbeckman314 Oct 1, 2025
483ee12
chore: Update gripql to 0.8.0
lbeckman314 Oct 2, 2025
4cddc79
feat: Support env variables for S3 credentials
lbeckman314 Oct 3, 2025
c650674
fix: Remove unused PGPASSWORD flag
lbeckman314 Oct 3, 2025
53e2c7f
fix: Remove PGPASSWORD parameters in favor of env vars
lbeckman314 Oct 3, 2025
c9581a3
fix: update `pg_dump` env vars
lbeckman314 Oct 3, 2025
35c70b4
feat: Update call to GRIP edges
lbeckman314 Oct 7, 2025
4e81901
fix: Update Dockerfile to include `pg_config` before Python build
lbeckman314 Oct 7, 2025
5c0eebc
fix: Add `gcc` dependency to Dockerfile for psycopg2 build
lbeckman314 Oct 7, 2025
2daf67f
fix: Update Dockerfile
lbeckman314 Oct 7, 2025
8a9f668
chore: Re-enable Postgres + S3 operations
lbeckman314 Oct 7, 2025
2b98f35
fix: Update tests + re-add `--vertex` flag to GRIP command
lbeckman314 Oct 8, 2025
0b197fe
fix: Downgrade PostgreSQL in Backup Service to match Server version
lbeckman314 Oct 17, 2025
581e39f
fix: Update Dockerfile to install postgresql-client-14
lbeckman314 Oct 17, 2025
0cbdd80
fix: Update call to get edges to match latest syntax (`G.V().outE()`)
lbeckman314 Oct 20, 2025
6db260b
feat: Re-add support for ElasticSearch backups
lbeckman314 Oct 21, 2025
a7d2e67
feat: Move CLI functions to respective modules
lbeckman314 Oct 21, 2025
de54c80
chore: Move CLI options to respective subcommand modules
lbeckman314 Oct 21, 2025
9fcfe05
tests: Add initial module test files
lbeckman314 Oct 21, 2025
4f2949d
feat: Add custom Elasticsearch Docker image with S3 Plugin installed
lbeckman314 Oct 24, 2025
abeed25
feat: Add working ES snapshot repo initialization
lbeckman314 Oct 25, 2025
5ae148f
feat: Add initial support for Elasticsearch snapshots
lbeckman314 Oct 28, 2025
645142d
feat: Add initial support for Elastic Snapshot Restore
lbeckman314 Oct 28, 2025
9edb44c
feat: Update entrypoint.sh
lbeckman314 Jan 5, 2026
7c027bd
fix: Replace psycopg2-binary with psycopg2 to fix build errors
lbeckman314 Jan 5, 2026
887a79c
fix: Remove aced-submission dependency
lbeckman314 Jan 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Contributing

# CLI

## Install

```sh
➜ git clone git@github.com:calypr/backup-service

➜ cd backup-service

➜ python3 -m venv venv && source venv/bin/activate

➜ pip install -r requirements.txt

➜ pip install -e .

➜ which bak
./venv/bin/bak

➜ bak --help
Usage: bak [OPTIONS] COMMAND [ARGS]...

Options:
--version Show the version and exit.
-v, --verbose, --debug Enable debug logging.
--help Show this message and exit.

Commands:
grip (gp) Commands for GRIP backups.
pg (pg) Commands for Postgres backups.
s3 Commands for S3.
```

## PostgreSQL

| Command | Example |
|-------------|------------------|
| List Tables | `bak pg ls` |
| Backup | `bak pg dump` |
| Restore | `bak pg restore` |

## GRIP

| Command | Example |
|-------------|--------------------|
| List Graphs | `bak pg ls` |
| Backup | `bak grip backup` |
| Restore | `bak grip restore` |

## S3

| Command | Example |
|--------------|-------------------|
| List backups | `bak pg ls` |
| Upload | `bak s3 upload` |
| Download | `bak s3 download` |

# Helm

```sh
➜ helm repo add ohsu https://ohsu-comp-bio.github.io/helm-charts
"ohsu" has been added to your repositories

➜ helm repo update ohsu
Update Complete. ⎈Happy Helming!⎈

➜ helm search repo ohsu
NAME CHART VERSION APP VERSION DESCRIPTION
ohsu/backups 0.2.5 1.13.0 A Helm chart for Kubernetes

➜ kubectl config current-context
kind-dev

➜ kubectl create secret generic postgres-credentials --from-literal=postgres-password=<PGPASSWORD> --namespace backups

➜ kubectl create secret generic s3-credentials --from-literal=AWS_ACCESS_KEY=<KEY> --from-literal=AWS_SECRET_KEY=<SECRET> --namespace backups

➜ helm upgrade --install backups ohsu/backups --create-namespace --namespace backups
Release "backups" has been upgraded. Happy Helming!

➜ kubectl create job example-job --from=cronjob/backup-service-cronjob --namespace backups
job.batch/example-job created

➜ kubectl get jobs -n backups
NAME STATUS COMPLETIONS DURATION
example-job Complete 1/1 11s

➜ mc ls cbds/calypr-backups/calypr-dev
2025-09-12T23:10:29/

➜ mc ls cbds/calypr-backups/calypr-dev/2025-09-12T23:10:29/
CALYPR.edges
CALYPR.vertices
CALYPR__schema__.edges
CALYPR__schema__.vertices
arborist_local.sql
fence_local.sql
gecko_local.sql
indexd_local.sql
postgres.sql
requestor_local.sql
```

* Steps to confirm backups in S3 bucket with mc

```sh
➜ brew install minio-mc

➜ which mc
/opt/homebrew/bin/mc

➜ mc alias set example https://aced-storage.ohsu.edu
Enter Access Key: <KEY>
Enter Secret Key:
Added `example` successfully.

➜ mc alias ls example
cbds
URL : https://aced-storage.ohsu.edu
AccessKey : <KEY>
SecretKey : <SECRET>
API : s3v4
Path : auto
Src : $HOME/.mc/config.json

➜ mc ls cbds/calypr-backups/calypr-dev/
...
2025-09-12T02:00:01/ <---- Last timestamped backup

➜ mc ls cbds/calypr-backups/calypr-dev/2025-09-12T02:00:01/
160MiB CALYPR.edges <---- CALYPR edges
1.8GiB CALYPR.vertices <---- CALYPR vertices
0B CALYPR__schema__.edges <---- Schema edges
1.4MiB CALYPR__schema__.vertices <---- Schema vertices
107KiB arborist_local.sql <---- Arborist
234KiB fence_local.sql <---- Fence
6.0KiB gecko_local.sql <---- Gecko
21MiB indexd_local.sql <---- Indexd
9.6KiB metadata_local.sql <---- Metadata
2.9KiB postgres.sql <---- Postgres
64KiB requestor_local.sql <---- Requestor
8.0KiB wts_local.sql <---- Workspace Token Service
```

# Known Limitations (Next Steps) ⚠️

- [ ] No clear, human-readable output of the path of the backup in S3 after a successful run
- [ ] Always uploads to calypr-dev even when using local k8s cluster
45 changes: 20 additions & 25 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
# GRIP build
# Ref: https://github.com/bmeg/grip/blob/develop/Dockerfile
FROM golang:1.17.2-alpine AS grip

RUN apk add --no-cache make git bash build-base

ENV GOPATH=/go
ENV PATH="/go/bin:${PATH}"

WORKDIR /go/src/github.com/bmeg

RUN git clone https://github.com/bmeg/grip
# Backup build
FROM python:slim

WORKDIR /go/src/github.com/bmeg/grip
RUN apt-get update && apt-get install -y \
build-essential \
gcc \
libpq-dev

# Checkout latest GRIP tag. Example:
# $ git describe --tags --abbrev=0
# v1.9.0
RUN git checkout $(git describe --tags --abbrev=0)
RUN apt-get install -y postgresql-common

RUN make install
RUN YES=true /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh

# Backup build
FROM python:slim
# Note: We're using Postgres 14 to match the version set in Gen3-Helm:
#
# Gen3-Helm Chart: https://github.com/calypr/gen3-helm/blob/v1.0.0/helm/gen3/Chart.yaml#L92-L94
#
# Postgres Chart: https://github.com/bitnami/charts/blob/postgresql/11.9.13/bitnami/postgresql/Chart.yaml#L4
#
# ```
# ➜ kubectl exec --stdin --tty StatefulSets/cbds-postgresql -- /bin/bash
# $ psql --version
# psql (PostgreSQL) 14.5
# ```
RUN apt-get update && apt-get install -y postgresql-client-14

WORKDIR /app

Expand All @@ -39,9 +39,4 @@ RUN mkdir -p /backups
COPY entrypoint.sh ./entrypoint.sh
RUN chmod +x ./entrypoint.sh

RUN apt-get update && apt-get install -y --no-install-recommends postgresql-client

# Copy GRIP binary from build stage
COPY --from=grip /go/bin/grip /usr/local/bin/grip

ENTRYPOINT ["./entrypoint.sh"]
13 changes: 13 additions & 0 deletions Dockerfile.es
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Creating a custom Docker image to include the s3 snapshot repository plugin
# Ref: https://github.com/elastic/helm-charts/blob/v7.17.3/elasticsearch/README.md#how-to-install-plugins

# Manual build command:
# docker buildx build --platform=linux/arm64,linux/amd64 -t quay.io/ohsu-comp-bio/elasticsearch-s3:7.17.3 -f Dockerfile.es . --push
# TODO: Add this to GitHub Actions for automatic builds

# Start from the official Elasticsearch image you are currently using
FROM docker.elastic.co/elasticsearch/elasticsearch:7.17.3

# Install the S3 repository plugin
# The 'install' command runs at build time, and is baked into the final image
RUN bin/elasticsearch-plugin install --batch repository-s3
103 changes: 55 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,26 @@ Data backup and recovery service for the CALYPR systems 🔄

# 2. Quick Start ⚡

> [!TIP]
> The recommended use of the backup-service is through deploying to a K8s cluster for automated daily backups.

```sh
➜ helm repo add ohsu https://ohsu-comp-bio.github.io/helm-charts

➜ helm upgrade --install backups ohsu/backups
```

# 3. CLI

> [!TIP]
> Manual backups (and restorations) can be done through the CLI

```sh
➜ git clone git@github.com:calypr/backup-service.git
Cloning into 'backup-service'...

➜ cd backup-service

➜ python3 -m venv venv && source venv/bin/activate

➜ pip install -r requirements.txt
Expand All @@ -49,40 +68,7 @@ Commands:
upload local ➜ S3
```

# 3. Design + Examples 📐

```mermaid
sequenceDiagram
participant Backup as Backup Service
participant Database
participant S3 as S3 Bucket

Title: Gen3 Backups

Backup-->>Database: Database Credentials

Note over Database: `pg_dump`

Database-->>Backup: Database Backup

Backup-->>S3: Database Backup
```

| Service | Postgres Database | Database Backup Name | Description |
| ---------------------- | ------------------- | ----------------------------- | ------------------------------------------------ |
| [Arborist][arborist] | `arborist-EXAMPLE` | `arborist-EXAMPLE-TIMESTAMP` | Gen3 policy engine |
| [Fence][fence] | `fence-EXAMPLE` | `fence-EXAMPLE-TIMESTAMP` | AuthN/AuthZ OIDC service |
| [Gecko][gecko] | `gecko-EXAMPLE` | `gecko-EXAMPLE-TIMESTAMP` | Frontend configurations for dynamic data loading |
| [Indexd][indexd] | `indexd-EXAMPLE` | `indexd-EXAMPLE-TIMESTAMP` | Data indexing and tracking service |
| [Requestor][requestor] | `requestor-EXAMPLE` | `requestor-EXAMPLE-TIMESTAMP` | Data access manager |

[arborist]: https://github.com/uc-cdis/arborist
[fence]: https://github.com/uc-cdis/fence
[gecko]: https://github.com/aced-idp/gecko
[indexd]: https://github.com/uc-cdis/indexd
[requestor]: https://github.com/uc-cdis/requestor

## Backup ⬆️
## Backup ⬆

### Postgres Dump:

Expand All @@ -95,12 +81,6 @@ sequenceDiagram
--dir DIR
```

## ElasticSearch Backup:

```
➜ bak es backup
```

## GRIP Backup:

```sh
Expand All @@ -118,7 +98,7 @@ sequenceDiagram
--secret SECRET
```

## Restore ⬇
## Restore ⬇

### Postgres Restore:

Expand All @@ -131,12 +111,6 @@ sequenceDiagram
--dir DIR
```

## ElasticSearch Restore:

```
➜ bak es restore
```

## GRIP Restore:

```sh
Expand All @@ -154,7 +128,40 @@ sequenceDiagram
--secret SECRET
```

# 4. Alternatives 📖
# 4. Design 📐

```mermaid
sequenceDiagram
participant Backup as Backup Service
participant Database
participant S3 as S3 Bucket

Title: Gen3 Backups

Backup-->>Database: Database Credentials

Note over Database: `pg_dump`

Database-->>Backup: Database Backup

Backup-->>S3: Database Backup
```

| Service | Postgres Database | Database Backup Name | Description |
| ---------------------- | ------------------- | ----------------------------- | ------------------------------------------------ |
| [Arborist][arborist] | `arborist-EXAMPLE` | `arborist-EXAMPLE-TIMESTAMP` | Gen3 policy engine |
| [Fence][fence] | `fence-EXAMPLE` | `fence-EXAMPLE-TIMESTAMP` | AuthN/AuthZ OIDC service |
| [Gecko][gecko] | `gecko-EXAMPLE` | `gecko-EXAMPLE-TIMESTAMP` | Frontend configurations for dynamic data loading |
| [Indexd][indexd] | `indexd-EXAMPLE` | `indexd-EXAMPLE-TIMESTAMP` | Data indexing and tracking service |
| [Requestor][requestor] | `requestor-EXAMPLE` | `requestor-EXAMPLE-TIMESTAMP` | Data access manager |

[arborist]: https://github.com/uc-cdis/arborist
[fence]: https://github.com/uc-cdis/fence
[gecko]: https://github.com/aced-idp/gecko
[indexd]: https://github.com/uc-cdis/indexd
[requestor]: https://github.com/uc-cdis/requestor

# 5. Alternatives 📖

> [!TIP]
> The alternative options below work on the K8s resources themseleves (e.g. PVC/PV) as opposed to database resources (e.g. Postgres tables, ElasticSearch indices)
Expand Down
Loading
Loading