Replies: 3 comments
-
As an ocean modeler, I've been struggling using Dask to rechunk and cloud-optimize massive (10-100TB size) datasets. I would like to try xarray-beam as the two use cases they have are for rechunking and creating climatologies from the 25TB ERA5 model (https://github.com/google/xarray-beam/tree/main/examples#readme) Perhaps this would be the stack to run this on Qhub:
Here's the doc on installing Flink on Kubernetes What do you qhub devs think? |
Beta Was this translation helpful? Give feedback.
-
First I've heard of Flink or Beam. Neither is currently on our radar. |
Beta Was this translation helpful? Give feedback.
-
A little background, in case anyone is not familiar with Flink or Beam-- Apache Flink (née Stratosphere, prior to its folding into the Apache Software Foundation circa 2014), can--with all apologies to the Flink team--largely be described as Apache Spark, but different. Like Spark, it's built with a Java/Scala core, in addition to Java/Scala/Python APIs. It supports UDFs for massively parallel computation, and has a very general DAG-based compute engine with functional primitives that programmers use to build their distributed applications. In the mid 2010s it was often included in every conversation involving Spark, and for good reason--my research lab even did some work evaluating Spark vs Flink in different settings, and found they were both quite performative (had some slight differences, but for our purposes were pretty much equivalent). Since then, both projects have picked up corporate backing (Spark -> DataBricks, Flink -> data Artisans), both have their own regular gatherings (Spark -> AI Summit, Flink -> Flink Forward), though Flink has specialized a bit more in the "stream processing" side of things, where Spark has kept a very general, all-purpose compute engine. Apache Beam is designed as an abstraction layer between the functional primitives exposed by Spark/Flink ( |
Beta Was this translation helpful? Give feedback.
-
I'm hoping this GH Discussion will act as a place for us to fleshout a road-map and some milestones for the QHub project (perhaps as part of
v0.4.0
release?). At the moment, there are many interesting and useful components of QHub that are being worked on or that have been recently merged. All of this is super exciting and I'm ready to see QHub take the next leap.Here are a few potential items that we can use to initiate further discussion, in no particular order:
qhub upgrade
for ease of upgrading between available QHub versions--target
)qhub-config.yaml
validation (during each step,init
,render
anddeploy
) to avoid failed deployments based on "bad" (and avoidable) user inputsqhub init
: forbid project names with forbidden character for the given cloud provider #867@pierrotsmnrd has also recently been performing interviews with QHub users and there are probably a lot more items from those discussions that we can add to this list. I'm confident that plenty of you have your own ideas on how to improve QHub and those should probably also be added to this list.
I look forward to future discussions on how we can keep improve QHub. Thank you for your contributions :)
Beta Was this translation helpful? Give feedback.
All reactions