Skip to content

[BUG] - Deploy keys selected in CLI, but platform services cannot use SSH #17

@GediminasKr

Description

@GediminasKr

🐛 Bug Description

When the user selects the deploy_keys repository access method in the CLI, the deployed platform services (e.g., Data Orchestrator/Airflow, CICD runner, Argo Workflows tasks, git-sync components) are not configured to authenticate over SSH. As a result, services fail to pull repositories that require SSH deploy keys. The CLI currently collects and generates deploy keys, but the runtime services do not receive or use them correctly, nor are known_hosts/host key verification configured.

References in code (for context):

  • CLI selection and normalization of repo access method happens in cli.py within _collect_secrets_parameters and _collect_repository_parameters (e.g., secrets_repo_access_method, validate_and_normalize_repo_url). The platform later assumes access, but Helm charts/workloads are not wired for SSH.

🔍 Steps to Reproduce

  1. Run the CLI and complete Phase 1 (infrastructure) and Phase 2 (secrets). In Phase 2 choose:
    • Git provider: GitHub/GitLab
    • Repository access method: deploy_keys
    • Provide DAG/data repo URLs in SSH form (e.g., git@github.com:owner/repo.git).
  2. Proceed with Phase 3 (repositories) and Phase 4/5 (infrastructure/data services) deploying services such as Data Orchestrator and CICD components.
  3. Observe platform services attempting to fetch repos (DAGs, data model) post-deploy.

✅ Expected Behavior

  • Platform services that need to pull repositories (Airflow DAG fetcher/git-sync, CICD runner jobs, data orchestration/modeling steps) should:
    • Have SSH private keys (deploy keys) mounted/available securely.
    • Use SSH-based repo URLs correctly.
    • Have known_hosts populated for the git host (e.g., github.com, gitlab.com).
    • Successfully clone/pull repos without manual intervention.

❌ Actual Behavior

  • Services start but fail to clone/pull over SSH with errors such as:
    • Permission denied (publickey).
    • Host key verification failed.
    • Git sync/initialization loops backoff due to missing SSH credentials or known_hosts.
  • No consistent wiring of deploy keys into workloads; SSH agent or GIT_SSH_COMMAND not configured; known_hosts not provisioned.

🖥️ Environment

  • OS: macOS 14.x / Ubuntu 22.04 (reproducible)
  • Python Version: 3.10+
  • Fast.BI Version: main branch
  • Deployment Type: GCP (also applicable to On-Prem/AWS/Azure)
  • CLI Version: main branch

📋 Additional Information

  • CLI informs that deploy keys are required for Data Orchestrator, but the deployed manifests do not propagate keys into the relevant pods/containers nor configure SSH.
  • Affects multiple services that interact with Git over SSH.

Logs

  • Airflow/git-sync examples: Permission denied (publickey) / Host key verification failed during repository fetch.

Configuration

  • repo_access_method: deploy_keys
  • Repo URLs in SSH form: git@<git-host>:owner/repo.git
  • Deploy keys generated and stored (local vault) but not mounted/used by services.

🔧 Possible Solution

  • End-to-end SSH support for deploy keys across services:
    • Provision secrets: store private key(s) and public key(s) under predictable secret names, e.g., repo-deploy-key-orchestrator, repo-deploy-key-data in the appropriate namespaces.
    • Mount keys into workloads:
      • Airflow/data orchestration pods (e.g., git-sync sidecars) mount private key at /home/airflow/.ssh/id_rsa (0600) and set GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=yes".
      • CICD runner jobs/Argo Workflow templates mount SSH key via volume/secret and export GIT_SSH_COMMAND or use an SSH agent init.
    • Manage known_hosts:
      • Add initContainer to write host keys for github.com/gitlab.com to /home/<user>/.ssh/known_hosts, or inject via ConfigMap. Alternatively run ssh-keyscan during init.
    • URL normalization:
      • Ensure when deploy_keys is selected, repo URLs are normalized to SSH form and passed through values/templates for services that clone.
    • Helm/chart updates:
      • Extend values for affected charts to accept ssh.enabled, ssh.privateKeySecretName, ssh.knownHostsConfigMapName.
      • Update templates to mount secrets/config and set env vars for git clients.
    • Validation:
      • In CLI, when deploy_keys is selected, validate that SSH secrets are created and referenced in service params; warn/error early if misconfigured.
    • Documentation:
      • Add a section describing deploy key flow, secret names, and required git host entries.

📚 Related Documentation

  • CLI selection points: cli.py functions _collect_secrets_parameters, _collect_repository_parameters, validate_and_normalize_repo_url.
  • Services likely impacted: Data Orchestration, CICD Workload Runner, any git-sync usage in charts.

Note: Please ensure you're using the latest version of Fast.BI and have checked existing issues for duplicates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions