Skip to content

discussion/Custom-Transfer-Agent-vs-Batch-API-Server #212

@bwalsh

Description

@bwalsh

ADR: Git LFS Integration Strategy — Custom Transfer Agent vs Batch API Server

Status

Discussion


Background

We are integrating Git LFS with a DRS-backed storage system that supports:

  • Client-managed buckets
  • Authorization bindings (IAM role, workload identity, broker reference)
  • Short-lived credential minting at access time

Git LFS supports two fundamentally different integration models. There are two primary integration surfaces in Git LFS:

  1. Client-side: Custom Transfer Agent
  2. Server-side: LFS Batch API

These are fundamentally different architectural patterns and support different use cases.

Both can interact with DRS-backed storage, but they differ significantly in:

  • Interoperability
  • Federation support
  • Security guarantees
  • Governance capabilities
  • SaaS compatibility
  • Scientific reproducibility

Decision Drivers

  • Interoperability with stock Git clients
  • Federation across organizations
  • Credential isolation and security
  • Compatibility with hosted Git platforms
  • Multi-user collaboration
  • Operational simplicity
  • Alignment with GA4GH / DRS design principles

This ADR documents the architectural differences, supported use cases, trade-offs,
and the “Add URL” (external object registration) workflow.


Architecture Overview

Custom Transfer Agent

flowchart LR
    A[Git Client] --> B[Custom Transfer Agent]
    B --> C[Object Store]
    B --> D[DRS APIs]
Loading

Option 1 — Git LFS Custom Transfer Agent (Client-Side)

Description

A custom transfer agent is configured in the Git client via:

git config lfs.customtransfer.<name>.path <binary>

Instead of contacting an LFS server, the Git LFS client:

  • Invokes the custom agent binary
  • Streams object metadata (OID, size)
  • Delegates upload/download directly to the agent

The agent is responsible for:

  • Authentication
  • Storage interaction
  • Credential resolution
  • Progress reporting

No LFS Batch API server is involved.

Architecture

Git Client
   │
   │ invokes
   ▼
Custom Transfer Agent
   │
   │ interacts with
   ▼
Object Store / DRS

Supported Use Cases

Use Case Supported
Single-user research workflows
Direct upload to S3/GCS via DRS
Air-gapped or private deployments
Power users with controlled environments
Experimental storage backends
Bypassing Git hosting provider

Characteristics

  • No server required
  • Full control over upload semantics
  • Easy to embed DRS-specific logic
  • Zero dependency on Git hosting platform support
  • Tight coupling between client and storage logic

Limitations

Not Supported or Problematic

Use Case Limitation
Multi-user collaboration ❌ Every user must install identical agent
Public Git hosting integration ❌ GitHub/GitLab do not invoke custom agents
Transparent user experience ❌ Requires client config
Centralized authorization policy ❌ Logic is pushed to clients
Server-side audit of batch requests ❌ Harder to enforce uniformly
Fine-grained repo policy enforcement ❌ Distributed across clients
Web-based uploads ❌ No web compatibility

2. LFS Batch API Server (Server-Side)

Description

A standard Git LFS server implements:

POST /info/lfs/objects/batch

The client sends:

{
  "operation": "upload",
  "objects": [
    { "oid": "...", "size": 123 }
  ]
}

The server responds with per-object actions:

{
  "objects": [
    {
      "oid": "...",
      "actions": {
        "upload": {
          "href": "https://signed-url",
          "header": { ... }
        }
      }
    }
  ]
}

The client then uploads directly to object storage using returned URLs.


Architecture

Git Client
   │
   │ HTTP
   ▼
LFS Batch API Server
   │
   │ resolves auth binding
   ▼
DRS Control Plane
   │
   ▼
Object Store

Supported Use Cases

Use Case Supported
Multi-user collaboration
Hosted Git platforms
Enterprise SSO / OAuth
Centralized policy enforcement
Auditable batch requests
Transparent client UX
Repo-scoped storage policies
SaaS deployment model

Characteristics

  • Fully compatible with stock Git LFS clients
  • No client customization required
  • Centralized policy and credential resolution
  • Aligns with DRS access-time credential minting
  • Enables fine-grained repo-level isolation

Comparison

Dimension Custom Transfer Agent Batch API Server
Requires LFS server
Requires client modification
Works with GitHub/GitLab
Centralized authorization
Repo-scoped isolation Weak Strong
Multi-tenant SaaS
Air-gapped research Possible
Operational simplicity Client-heavy Server-heavy
Auditable batch control Limited Strong
GA4GH alignment Medium High

Add URL / External Object Registration Semantics

Overview

The “Add URL” workflow allows registration of a pre-existing external object
without re-uploading it.

Supports:

  • Large datasets already stored in cloud buckets
  • Cross-project reuse
  • Federated research workflows

Definitions

No User Upload

The client does not upload object bytes.

No Transfer

No object bytes are transferred at all (server already knows sha256).


Mode A — URL + sha256 + size

sequenceDiagram
    participant U as User
    participant C as Git Client
    participant S as LFS/DRS Server
    participant O as Object Store

    U->>C: git lfs add-url <url> --sha256 --size
    C->>O: HEAD <url>
    C->>S: Verify sha256 exists
    S-->>C: Exists
    C->>C: Write pointer file
Loading

Transfer semantics:

  • No user upload
  • No transfer (if already indexed)

Mode B — URL Only

sequenceDiagram
    participant U as User
    participant C as Git Client
    participant S as LFS/DRS Server
    participant O as Object Store

    U->>C: git lfs add-url <url>
    C->>O: HEAD <url>
    C->>S: Resolve sha256
    alt Known
        S-->>C: sha256 + size
    else Unknown
        S->>O: GET object (ingest)
        S-->>C: sha256 computed
    end
    C->>C: Write pointer file
Loading

Transfer semantics:

  • No user upload
  • Server-side transfer may occur

Error Cases

Condition Error
Size mismatch SIZE_MISMATCH
Checksum mismatch CHECKSUM_MISMATCH
Unstable URL UNSTABLE_OBJECT_SOURCE
Object modified after registration IMMUTABILITY_VIOLATION
Source not accessible SOURCE_NOT_ACCESSIBLE

Architectural Comparison

Dimension Custom Agent Batch API Server
Hosted Git Compatible No Yes
Centralized Policy No Yes
Multi-tenant SaaS No Yes
Auditability No Yes
Safe Federated Add-URL No Yes
Immutability Enforcement Weak Strong

Critical Gap: What Is NOT Supported If Only Custom Transfer Agent Is Used

If we rely solely on a custom transfer agent:

  1. Hosted Git integration is impossible

    • GitHub and GitLab do not execute custom agents.
    • Users cannot push from standard environments.
  2. Multi-user standardization is fragile

    • Every collaborator must install and configure the agent.
    • Version drift causes inconsistencies.
  3. No centralized policy enforcement

    • Bucket and authorization logic lives on client machines.
    • Hard to enforce repo-specific storage controls.
  4. No transparent federation

    • External collaborators cannot push without installing software.
    • Violates "it just works with Git" principle.
  5. Reduced security posture

    • More credential resolution logic distributed to clients.
    • Harder to audit access centrally.
  6. No web or CI integration

    • CI/CD systems require custom agent install.
    • Web-based file operations cannot use it.

When Custom Transfer Agent Is Appropriate

  • Research-only deployments
  • Internal platform experiments
  • Developer tooling
  • Transitional architecture
  • Air-gapped or highly controlled environments

When Batch API Server Is Required

  • Production multi-tenant environments
  • Public Git hosting compatibility
  • Enterprise SSO integration
  • GA4GH federated ecosystems
  • DRS-backed storage federation at scale

Discussion

For a federated, multi-tenant, GA4GH-aligned system:

A Git LFS Batch API server is required.

The custom transfer agent may remain as a complementary tool but cannot be the sole integration surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions