GitHub - ilum-cloud/sdp-ui: Spark Declarative Pipelines UI

Spark Declarative Pipelines UI

Coming Soon: Open Source Visual Interface for Apache Spark Declarative Pipelines

🔔 Get Notified About the Release • 📖 Documentation

🎉 Announcement

Ilum is open-sourcing the Spark Declarative Pipelines UI!

We're excited to announce that we'll be releasing a powerful, open-source visual interface for building and managing Apache Spark Declarative Pipelines (SDP). This tool will make it easier than ever to design, visualize, and deploy declarative data pipelines on Spark 4.1+.

📬 Stay Updated

Want to be the first to know when we launch? Register here to get notified about the official release and early access opportunities.

🚀 What is Spark Declarative Pipelines?

Spark Declarative Pipelines (SDP) is a revolutionary framework for building reliable, maintainable, and testable data pipelines on Apache Spark. Donated by Databricks to the Apache Spark open-source project in June 2025, it represents the evolution of Delta Live Tables into a vendor-neutral, community-driven standard.

The Paradigm Shift: From Imperative to Declarative

Traditional Imperative Approach:

df_sales = spark.read.format("csv").load("s3://raw-data/sales.csv")
df_products = spark.read.format("json").load("s3://raw-data/products.json")
df_joined = df_sales.join(df_products, "product_id")
df_aggregated = df_joined.groupBy("product_category").agg(sum("amount").alias("total_sales"))
df_aggregated.write.format("delta").mode("overwrite").save("s3://curated-data/product_sales_summary")

Declarative Approach with SDP:

from pyspark import pipelines as dp

@dp.materialized_view
def product_sales_summary():
    sales = spark.table("sales")
    products = spark.table("products")
    return (sales.join(products, "product_id")
            .groupBy("product_category")
            .agg(sum("amount").alias("total_sales")))

✨ Key Features of Spark Declarative Pipelines

🎯 Automatic Orchestration

SDP automatically analyzes dependencies between datasets and orchestrates execution order with maximum parallelism. No manual DAG definition required.

⚡ Reduced Development Time

Build pipelines up to 90% faster by eliminating boilerplate code for checkpoint management, incremental processing, and error handling.

🔄 Unified Batch and Streaming

Single API for both batch and streaming workloads. Toggle between processing modes with minimal code changes.

🛡️ Built-in Fault Tolerance

Automatic checkpointing, state management, and multi-level retry logic (task → flow → pipeline) for transient failures.

📈 Incremental Processing

Automatically processes only new or changed data, avoiding expensive full table scans.

🎨 Declarative Programming

Define what your pipeline should produce, not how to execute it. Spark handles the orchestration, dependency management, and optimization automatically.

🔧 No External Orchestrator Required

Unlike traditional workflows that require Apache Airflow or similar tools, SDP manages task dependencies internally.

🖥️ What Will the UI Provide?

The Spark Declarative Pipelines UI will bring visual development and management capabilities to SDP, making it accessible to a broader audience. Expected features include:

📊 Visual Pipeline Designer: Drag-and-drop interface for building data pipelines
🔍 Dependency Graph Visualization: Interactive DAG view showing data flow and dependencies
📝 Code Generation: Automatically generate Python and SQL pipeline definitions
⚙️ Pipeline Configuration: Visual editor for pipeline.yml specifications
📈 Real-time Monitoring: Track pipeline execution, flow status, and performance metrics
🐛 Debug Tools: Inspect checkpoints, view logs, and troubleshoot issues
📚 Template Library: Pre-built pipeline templates for common use cases
🔐 Access Control: Manage permissions and collaboration features

🎯 Who Is This For?

The Spark Declarative Pipelines UI is designed for:

Data Engineers building production ETL workflows on Apache Spark
Data Analysts who want to create data pipelines without deep Spark expertise
Platform Teams providing self-service data infrastructure
Organizations migrating from proprietary platforms to open-source solutions
Teams seeking vendor-neutral alternatives to cloud-specific tools

📖 Core Concepts

Flows

The foundational data processing unit supporting both streaming and batch semantics. Flows read data from sources, apply transformations, and write results to target datasets.

Datasets

Streaming Tables: Incremental processing of streaming data (Kafka, Kinesis, cloud storage)
Materialized Views: Precomputed batch tables with incremental refresh capabilities
Temporary Views: Scoped to pipeline execution, useful for reusable transformation logic

Pipelines

The primary unit of development containing one or more flows, streaming tables, and materialized views. SDP automatically analyzes dependencies and orchestrates execution.

Dependency Graph (DAG)

Automatically constructed graph representing data dependencies, enabling optimization, parallelism, fault tolerance, and transparency.

🛠️ Technology Stack

Apache Spark 4.1+: Built on Spark Declarative Pipelines framework
Spark Connect: Leverages Spark Connect protocol for remote execution
Python & SQL: Full support for both Python and SQL pipeline definitions
Open Source: Fully open-source with community-driven development

📚 Resources

Get Notified About Release - Register for launch updates
Ilum Documentation - Learn more about our platform
Apache Spark - Official Spark documentation
Spark Declarative Pipelines Guide - SDP documentation (coming with Spark 4.1 release)

🤝 Contributing

This project will be open source and we welcome contributions! Once released, you'll be able to:

Report bugs and request features
Submit pull requests with improvements
Help with documentation
Share pipeline templates and examples

Register for updates to be notified when contribution guidelines are published.

📋 Roadmap

Current Status: Development

The Spark Declarative Pipelines UI is currently under active development as a part of beta feature within Ilum Enterprise Edition. We're working towards an initial release that will include:

✅ Visual pipeline designer
✅ DAG visualization
✅ Code generation (Python & SQL)
✅ Pipeline execution monitoring
🚧 Data Lineage
🚧 Packaging (Docker) and Helm chart
🚧 Advanced debugging tools
📅 Template library
📅 Multi-user collaboration features

Stay informed about our progress - register to receive updates on development milestones and release dates.

🌟 Why Open Source?

At Ilum, we believe in the power of open-source collaboration. By open-sourcing the Spark Declarative Pipelines UI, we aim to:

Accelerate Adoption: Make declarative pipelines accessible to everyone
Foster Innovation: Enable community-driven feature development
Ensure Vendor Neutrality: Provide a truly open alternative to proprietary tools
Build Together: Create the best possible tool through collective expertise

💡 Example Use Cases

Real-time Analytics

Build streaming pipelines that ingest data from Kafka, enrich it with dimension tables, and produce real-time aggregates.

ETL Workflows

Create batch pipelines that extract data from multiple sources, transform it through multiple layers (bronze/silver/gold), and load results into data warehouses.

Change Data Capture (CDC)

Implement slowly changing dimensions (SCD Type 2) with automatic change tracking and historical versioning.

Data Quality Monitoring

Define data quality expectations and automatically track violations without failing entire pipelines.

📞 Contact & Support

Website: ilum.cloud
Documentation: ilum.cloud/docs
Get Access: ilum.cloud/get-access

📄 License

This project will be released under an open-source license. Specific license details will be announced with the initial release.

🔔 Don't miss the launch!
Register now to be notified when we release

Made with ❤️ by the Ilum team

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
logo.svg		logo.svg

License

ilum-cloud/sdp-ui

Folders and files

Latest commit

History

Repository files navigation