You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we have discussed the roadmap on the mailing list and Google Docs, I would like to copy the details to GitHub Discussions to provide an overview of the roadmap.
Along with Apache Cloudberry™️ (Incubating) starting its incubation journey under the Apache umbrella, we would like to submit one proposal on the Apache Cloudberry roadmap. This roadmap outlines the key milestones and paths of the Apache Cloudberry project in the future. It aims to illustrate what Apache Cloudberry will be in the short-term, middle-term, or long-term. It's hard to cover everything, but we hope the community members can review this documentation to leave comments or feedback. Also, we can review these items to see the progress half-yearly or quarterly to decide whether they need to be updated at the community meetings when necessary.
Welcome to have your feedback to help shape the future of Apache Cloudberry together.
Implementation
Overview
Before going into details, let's see the landscape we want Cloudberry should focus on:
Start the incubation journey and graduate from Apache Incubator to be one Apache Top Level Project in a few months or 1~ year.
Cherry-pick the commits from Greenplum to Cloudberry to catch up on the latest Greenplum’s open-source version codebase.
Upgrade the PostgreSQL kernel yearly to let users utilize the features and enhancements introduced by the newer PostgreSQL.
Strengthen the stability by introducing more test frameworks. Optimize the performance and high availability, Introduce the minor version binary compatible tests, and more.
Improve the usability and user experiences to lower the bar for installation, deployment, operation, management, observability, etc.
Provide more modern solutions including Streaming/Real-time, Lakehouse, and AI/ML around Apache Cloudberry.
Build and grow the ecosystem, including the tools, and integrations with other ASF projects and the upstream or downstream projects.
Details
Community Meetings
We would like to start and help coordinate regular community meetings monthly (can be biweekly if there are more active developments). The initial meetings will aim to discuss the key features and plans of the project and involve developers and users from the Americas, EMEA, and Asia-Pacific. However, finding an ideal time for all regions may be challenging.
If some people would like to help organize other focus meetings, that would be great, including the marketing focus meeting, or the CI/CD focus meeting, allowing for deeper and more streamlined discussions. These meetings will be open to all the community members, not limited to PPMC members.
Time: TBD (Option: last Friday 8 am UTC/11:00 am UTC+3/16:00 UTC+8 / 1:00 am UTC-7 every month.)
The meeting will take about 45~ minutes. Everyone can apply to be the volunteer host to help manage the meeting efficiently and take meeting notes. The meeting language will be in English preferred.
Meeting agendas should be prepared in advance and documented in the cwiki or Google Docs (preferred, everyone can comment. ). Additionally, the sessions will be recorded for easy follow-up and reference.
We will do meeting tools research, like Zoom, Jit.si Meet, Google Meet, and other tools.
(We can start one new mailing list thread on this to talk more.)
Apache Incubation and Graduation
In the coming times, we will develop the Cloudberry and build the community following the Apache Way. This may take a few months or 1~ year for Cloudberry to graduate from the incubator to become a Top Level Project. There's lots to do for us.
Cherry-pick from Greenplum to Cloudberry (Highest Priority)
As you know, Cloudberry takes Greenplum 7.0.0-beta.3 and newer PostgreSQL kernel as the codebase. Firstly, we plan to cherry-pick the commits from the archived open-source Greenplum to Cloudberry to catch up the Greenplum's latest code (See #675).
PostgreSQL Kernel Upgrade (TBD)
Then, we would upgrade the built-in PostgreSQL kernel annually to help Cloudberry users utilize the features and enhancements introduced by the newer stable versions of PostgreSQL. The target upgrade PostgreSQL version will be two versions behind the latest released PostgreSQL of that year. Eg, In 2024, the latest PG version is 17, so we will upgrade the PostgreSQL kernel to PG15.x (=17-2).
For a long-term strategy, we want to split the features and decouple them from the PostgreSQL kernel to make Cloudberry components more pluggable. Now interconnect has been pluggable, the dispatcher, optimizer, planner, and transaction management (2-phase commit) are still waiting to be done. Welcome to have your contributions on these.
Performance and Usability
Support hybrid Row-Column storage, inspired by Partition Attributes Across (https://www.vldb.org/conf/2001/P169.pdf), which has the same write performance as AO tables and the same read performance as AOCS tables. We will also integrate the latest compression algorithms and encoding algorithms (such as dictionary encoding) into it.
Support vectorization execution engine to optimize the query performance.
Refactor the dispatch logic for improved efficiency.
Refactor the Materialized view and query for external tables.
Support parallel execution in ORCA.
Parallel query optimization to support more SQL operators.
Projection support (materialized view for AO tables).
Support more than 10 tables in ORCA, improve search space exploration
ORCA time limits (based on the number of permutations or optimization time)
…
….
Availability improvements
Support cold standby
Support hot (read-only) standby
Support ephemeral temporary objects
Support cluster write barrier: with this we can take consistent snapshot for all disks on all cluster nodes
Support more than one segment mirror
Support mirrors in different AZ
Support quorum replication and jepsen tests to check if we not lose data
Graceful segment shutdown
Robust resource groups isolation - IO/CPU/Memory/Network
Introduce more testing frameworks and methodology of open source. For example, introduce automatic SQL generation testing, SQLancer, and Chaos testing for system robustness.
Refactor current ICW cases to reduce running time.
Binary swap tests between minor versions
Usability
Disaster Recovery - providing disaster recovery capabilities for Cloudberry to enable point-in-time recovery (PITR) to recover the Cloudberry cluster to a certain restore point in the case of a disaster.
Cloudberry Central Console (like GPCC)
Support upgrading tools for in-place upgrade.
K8S deployment support for Cloudberry Database
Migration tool from Oracle/MSSQL database
Rename the gp* or greenplum* related commands or keywords to cb* or cloudberry* for better compliance. But we need to create aliases to bridge the old and new ones to let users have a smooth transition.
Streaming / Real-time
Implementing kafka_fdw extension to enable streaming data from Kafka to Cloudberry.
Integration with Flink CDC / Kafka connector to support near real-time data integration.
Support Dynamic Tables.
Lakehouse
Integration with various data lakes (including Iceberg, Hudi, Delta Lake, and more) as plugins. For the integration with Apache Iceberg, see the discussion: Cloudberry Database Roadmap 2024 #369 .
AI/ML
Integration with Ray (https://www.ray.io/) to support AI/ML workloads. (High priority)
Working with Apache MADlib community to support Cloudberry natively in MADlib upstream codebase.
Support graph query: AI applications are moving to using graph to fetch knowledge for better outcomes
Utilities and Ecosystem
We aim to let Cloudberry as the first-class citizen be supported in the ecosystem, not just doing some minor updates based on the Greenplum supports in the upstream tools.
Cherry-pick the latest commits from the original Greenplum projects to Cloudberry, including cloudberry-pxf, cloudberry-gpbackup, cloudberry-gpbackup-s3-plugin, cloudberry-go-libs.
Support PGRX to support writing UDFs in Rust in Cloudberry.
DBeaver for Cloudberry
JDBC/ODBC for Cloudberry
Container Service for Cloudberry Database
Cloudberry command center for database, query and resource management
Note:
The original GitHub organization github.com/cloudberrydb will be renamed to github.com/cloudberry-contrib, which includes some Cloudberry developers but is not officially maintained by Cloudberry PPMC. The org will store some non-Apache License extensions or projects for Cloudberry, like pgvector, PL/Java, PL/R and more tools.
Pgvector version upgrade from 0.5.x to 0.8.x
PostGIS version upgrade from 2.5.x to 3.3.2
Release Management
We will establish a predictable and sustainable release process to provide stable software to our users while maintaining quality:
Release Cadence
Major releases quarterly (x.y.0)
Minor releases (x.y.z) as needed for critical fixes
First Apache release targeted for [DATE]
Release candidates to undergo minimum 2-week community testing
Version Management
Follow semantic versioning (MAJOR.MINOR.PATCH)
Major version: Incompatible API/ABI changes
Minor version: New functionality in backward-compatible manner
Patch version: Backward-compatible bug fixes
Pre-release versions marked with -alpha, -beta, -rc suffixes
Release Process
Documented release procedures following Apache guidelines
Automated release preparation and verification tools
Release notes and migration guides for each version
Security vulnerability handling process
Release & Pipelines
Our goal is to make Cloudberry's CICD workflow more flexible, robust, and automated, which also can be reused by the community users and developers in their environments.
Introduce the new build, test, and deployment workflows for Cloudberry based on GitHub Actions and Docker.
Support more OS matrices and artifacts, Docker images, including Rocky Linux, Debian, and Ubuntu.
Support more CPU arch, including x86_64, ARM, RISCV, and LoongArch.
Support skipping the CICD workflow for some pull requests with specified file formats or directories in the main repo, like *.txt, *.md, *.mdx, *.png, and /doc dir to save test resources.
Add comments commands for the pull request review, like /build ,/rebase , /ok-to-test to trigger the commands by PR authors and project committers, which also can help reduce the cost. Reference: http://prow.k8s.io/command-help.
Add git pre-commit workflow to help check the commit message conventions.
Ansible playbook on cloud provider.
Website, Documents and Marketing
This part will include some short-term and mid-term items we want to do for the website, documents, and marketing.
Website:
Clean up the website source code.
Update the disclaimer and check as per the ASF brand policy and the Podling website guide.
Optimize the website style and redesign some pages like the homepage or blog index page.
Documents:
Restructure the existing documents to make them more organized.
Cherry-pick the doc updates from Greenplum and PostgreSQL.
Generate more new documents to align with the project features.
Work with the developer to create the development guide.
Marketing:
Adopt ASF social media guidelines for Cloudberry and create the workstream for social media platforms.
type: ProposalProposals of major changes to Cloudberry Database
1 participant
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Proposers
Proposal Contributors
Proposal Status
Completed
Abstract
As we have discussed the roadmap on the mailing list and Google Docs, I would like to copy the details to GitHub Discussions to provide an overview of the roadmap.
Motivation
Along with Apache Cloudberry™️ (Incubating) starting its incubation journey under the Apache umbrella, we would like to submit one proposal on the Apache Cloudberry roadmap. This roadmap outlines the key milestones and paths of the Apache Cloudberry project in the future. It aims to illustrate what Apache Cloudberry will be in the short-term, middle-term, or long-term. It's hard to cover everything, but we hope the community members can review this documentation to leave comments or feedback. Also, we can review these items to see the progress half-yearly or quarterly to decide whether they need to be updated at the community meetings when necessary.
Welcome to have your feedback to help shape the future of Apache Cloudberry together.
Implementation
Overview
Before going into details, let's see the landscape we want Cloudberry should focus on:
Details
Community Meetings
We would like to start and help coordinate regular community meetings monthly (can be biweekly if there are more active developments). The initial meetings will aim to discuss the key features and plans of the project and involve developers and users from the Americas, EMEA, and Asia-Pacific. However, finding an ideal time for all regions may be challenging.
If some people would like to help organize other focus meetings, that would be great, including the marketing focus meeting, or the CI/CD focus meeting, allowing for deeper and more streamlined discussions. These meetings will be open to all the community members, not limited to PPMC members.
Time: TBD (Option: last Friday 8 am UTC/11:00 am UTC+3/16:00 UTC+8 / 1:00 am UTC-7 every month.)
The meeting will take about 45~ minutes. Everyone can apply to be the volunteer host to help manage the meeting efficiently and take meeting notes. The meeting language will be in English preferred.
Meeting agendas should be prepared in advance and documented in the cwiki or Google Docs (preferred, everyone can comment. ). Additionally, the sessions will be recorded for easy follow-up and reference.
We will do meeting tools research, like Zoom, Jit.si Meet, Google Meet, and other tools.
(We can start one new mailing list thread on this to talk more.)
Apache Incubation and Graduation
In the coming times, we will develop the Cloudberry and build the community following the Apache Way. This may take a few months or 1~ year for Cloudberry to graduate from the incubator to become a Top Level Project. There's lots to do for us.
We will take the incubator website (https://incubator.apache.org/), Apache Project Maturity Model (https://community.apache.org/apache-way/apache-project-maturity-model.html) and more ASF policy or guides as reference and process with the help from our mentors.
Cherry-pick from Greenplum to Cloudberry (Highest Priority)
As you know, Cloudberry takes Greenplum 7.0.0-beta.3 and newer PostgreSQL kernel as the codebase. Firstly, we plan to cherry-pick the commits from the archived open-source Greenplum to Cloudberry to catch up the Greenplum's latest code (See #675).
PostgreSQL Kernel Upgrade (TBD)
Then, we would upgrade the built-in PostgreSQL kernel annually to help Cloudberry users utilize the features and enhancements introduced by the newer stable versions of PostgreSQL. The target upgrade PostgreSQL version will be two versions behind the latest released PostgreSQL of that year. Eg, In 2024, the latest PG version is 17, so we will upgrade the PostgreSQL kernel to PG15.x (=17-2).
For a long-term strategy, we want to split the features and decouple them from the PostgreSQL kernel to make Cloudberry components more pluggable. Now interconnect has been pluggable, the dispatcher, optimizer, planner, and transaction management (2-phase commit) are still waiting to be done. Welcome to have your contributions on these.
Performance and Usability
Availability improvements
Functionality improvements
Quality Assurance
Usability
Migration tool from Oracle/MSSQL databaseStreaming / Real-time
Lakehouse
AI/ML
Utilities and Ecosystem
We aim to let Cloudberry as the first-class citizen be supported in the ecosystem, not just doing some minor updates based on the Greenplum supports in the upstream tools.
Release Management
We will establish a predictable and sustainable release process to provide stable software to our users while maintaining quality:
Release Cadence
Version Management
Release Process
Release & Pipelines
Our goal is to make Cloudberry's CICD workflow more flexible, robust, and automated, which also can be reused by the community users and developers in their environments.
Website, Documents and Marketing
This part will include some short-term and mid-term items we want to do for the website, documents, and marketing.
Rollout/Adoption Plan
No response
Are you willing to submit a PR?
Beta Was this translation helpful? Give feedback.
All reactions