QHub Roadmap Discussion #907

iameskild · 2021-11-08T21:32:03Z

iameskild
Nov 8, 2021
Collaborator

I'm hoping this GH Discussion will act as a place for us to fleshout a road-map and some milestones for the QHub project (perhaps as part of v0.4.0 release?). At the moment, there are many interesting and useful components of QHub that are being worked on or that have been recently merged. All of this is super exciting and I'm ready to see QHub take the next leap.

Here are a few potential items that we can use to initiate further discussion, in no particular order:

Integrate newer version of conda-store
- conda-store integration into QHub #798
Build backup and restore capabilities (currently exploring Velero)
- Backup and Restore Implementation #743
Harden CI-CD by improving or automating:
- Unit tests
  - No tracking issue yet
- Kubernetes test + Weekly end-to-end integration tests
  - [bug] Cypress Tests are failing #895
- QHub version release testing
  - No tracking issue yet
- DEvSecOps
  - [enhancement] - add DEvSecOps to QHub CI #891
Build qhub upgrade for ease of upgrading between available QHub versions
- See qhub upgrade PR here
Split infrastructure into components (no longer relying on --target)
- Split infrastructure into components #847
Add more robust qhub-config.yaml validation (during each step, init, render and deploy) to avoid failed deployments based on "bad" (and avoidable) user inputs
- No tracking issue yet
- [enhancement] qhub init : forbid project names with forbidden character for the given cloud provider #867
Fold current "helm extensions" (Prefect, ClearML, etc) into the new extensions mechanism
- [EPIC] Nebari Extension Mechanism #865
Improve documentation
- No tracking issue yet
Improve QHub sign-in page
- [enhancement] Display nebari's internal parameters : version number, project name, ... #869

@pierrotsmnrd has also recently been performing interviews with QHub users and there are probably a lot more items from those discussions that we can add to this list. I'm confident that plenty of you have your own ideas on how to improve QHub and those should probably also be added to this list.

I look forward to future discussions on how we can keep improve QHub. Thank you for your contributions :)

rsignell-usgs · 2022-01-03T12:41:34Z

rsignell-usgs
Jan 3, 2022

As an ocean modeler, I've been struggling using Dask to rechunk and cloud-optimize massive (10-100TB size) datasets.

I would like to try xarray-beam as the two use cases they have are for rechunking and creating climatologies from the 25TB ERA5 model (https://github.com/google/xarray-beam/tree/main/examples#readme)

Perhaps this would be the stack to run this on Qhub:

Kubernetes => Apache Flink => Apache Beam => xarray-beam

Here's the doc on installing Flink on Kubernetes

What do you qhub devs think?

0 replies

dharhas · 2022-01-05T15:55:15Z

dharhas
Jan 5, 2022
Maintainer

First I've heard of Flink or Beam. Neither is currently on our radar.

0 replies

magsol · 2022-01-05T20:19:06Z

magsol
Jan 5, 2022

A little background, in case anyone is not familiar with Flink or Beam--

Apache Flink (née Stratosphere, prior to its folding into the Apache Software Foundation circa 2014), can--with all apologies to the Flink team--largely be described as Apache Spark, but different. Like Spark, it's built with a Java/Scala core, in addition to Java/Scala/Python APIs. It supports UDFs for massively parallel computation, and has a very general DAG-based compute engine with functional primitives that programmers use to build their distributed applications. In the mid 2010s it was often included in every conversation involving Spark, and for good reason--my research lab even did some work evaluating Spark vs Flink in different settings, and found they were both quite performative (had some slight differences, but for our purposes were pretty much equivalent). Since then, both projects have picked up corporate backing (Spark -> DataBricks, Flink -> data Artisans), both have their own regular gatherings (Spark -> AI Summit, Flink -> Flink Forward), though Flink has specialized a bit more in the "stream processing" side of things, where Spark has kept a very general, all-purpose compute engine.

Apache Beam is designed as an abstraction layer between the functional primitives exposed by Spark/Flink (map, flatMap, reduce, reduceByKey, etc) and the high-level tasks you're performing, particularly when it comes to streaming data. Beam has a handful of supported "runners" (distributed backends, like Spark or Flink) and exposes a DSL interface for consuming data from a source, performing some high-level operations, and outputting it to a sink.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nebari-dev

QHub Roadmap Discussion #907

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

nebari-dev

QHub Roadmap Discussion #907

iameskild Nov 8, 2021 Collaborator

Replies: 3 comments

rsignell-usgs Jan 3, 2022

dharhas Jan 5, 2022 Maintainer

magsol Jan 5, 2022

iameskild
Nov 8, 2021
Collaborator

rsignell-usgs
Jan 3, 2022

dharhas
Jan 5, 2022
Maintainer

magsol
Jan 5, 2022