Discussion on the integration of Optimizer with K8S and the classification of OptimizerContainer. #1929

baiyangtx · 2023-09-07T07:15:11Z

baiyangtx
Sep 7, 2023
Collaborator

The community has been promoting integration with Kubernetes recently, but there are some disagreements on how to integrate with K8S and the concept of OptimizerContainer. Therefore, we would like to have a discussion to hear everyone's thoughts.

There are two main topics to discuss:

How to support Optimizer on Kubernetes..
The classification of OptimizerContainer.

Regarding the first topic: How to support Optimizer on Kubernetes.

There are currently two deployment modes discussed within the community:

AMS schedules POD through K8S API, and the Pod contains the Optimizer Java process. Some members are already promoting this method, and you can refer to PR [AMORO-1958][WIP] Support Kubernetes Optimizer #1927.
AMS deploys and starts Optimizer through Flink shell by using the Flink native Kubernetes . This method only requires modifying the startup command in the current FlinkOptimizerContainer to support it.

Compared with method 2, method 1 is more resource-saving, and each Optimizer can save a JM resource, but it cannot enjoy the Flink Dashboard UI and other ecosystems. Method 2 is more friendly for users familiar with Flink.

Regarding the second topic: The classification of OptimizerContainer.

Currently, there are two types: OptimizerContainer, local, and flink.

Local corresponds to the subprocess started by AMS, which runs in the same environment as AMS.

Flink corresponds to a Flink task that relies on Flink to deploy Optimizer on Yarn.

One view is that OptimizerContainer corresponds to a cluster type, such as Yarn or K8S or Local. Another view is that the container corresponds to the base on which the Optimizer Job runs and should be distinguished as Jvm or Flink, or it can be considered the intersection of the two. Classified as Local, FlinkOnYarn, FlinkOnK8S.

What are your views on these two issues?

baiyangtx · 2023-09-07T07:17:35Z

baiyangtx
Sep 7, 2023
Collaborator Author

My personal views:

The solution for Optimizer on K8s.

I tend to favor AMS directly managing pods.

I think what is currently used more in the Flink ecosystem is to view logs and task monitoring through the Dashboard UI. But I think kubectl logs is already good enough, and pod monitoring solutions are also relatively mature.

So I value the resources saved by directly managing pods.

However, I do not exclude the Flink native Kubernetes mode, but I recommend pods as the best solution on K8s.

OptimizerContainer classification issue.

I tend to classify it as follows:

Standalone: Subprocess launched by AMS, which is the current local optimizer.

Flink: Optimizer running as a container with Flink as the engine, including Flink on Yarn and Flink native K8s, which are actually a parameter of Flink itself.

Kubernetes: Optimizer launched by AMS managing pods.

Spark: In the future, if support for Spark is needed, similar to Flink, the engine will be used as the scheduling container, and On Yarn or On K8s belongs to the Property inside the container.

0 replies

zhoujinsong · 2023-09-07T07:41:55Z

zhoujinsong
Sep 7, 2023
Collaborator

In my option:

The solution for Optimizer on K8s.
We should not limit the way Optimizers run in a K8s environment. Both Java applications and Flink jobs should be supported, and we will also support Spark job running on K8s in the future.

OptimizerContainer classification issue.
I have come up with two ways to extend the ResourceContainer：

Based on how the optimizer running, like ：

JavaOptimizerContainer: support running java optimizer(Optimizer based on java app) in local and K8s envrionment.
FlinkOptimizerContainer: support running flink optimizer (Optimizer based on flink job) in yarn and K8s envrionment.
SparkOptimizerContainer: support running spark optimizer (Optimizer based on spark job) in yarn and K8s envrionment.

Based on where the optimizer running, like ：

LocalOptimizerContainer: support running optimizer in local envrionment, just java application maybe.
K8sOptimizerContainer: support running optimizer in K8s envrionment, through java application or Flink job or Spark job.
YarnOptimizerContainer: support running optimizer in Yarn envrionment, through Flink job or Spark job.

The choice depends on whether users are more concerned about how Optimizer runs or where it runs when selecting its deployment.
As I know, Amoro users used to select by how Optimizer runs (throuph Flink or Spark), but I still want to here more voice about this topic.

2 replies

baiyangtx Sep 7, 2023
Collaborator Author

I support classification based on how the optimizer is scheduled.

In fact, the role of OptimizerContainer is to handle the creation and destruction of Optimizer resources, which is a scheduling issue. Therefore, it is reasonable to classify it based on the scheduling mode.

StandaloneOptimizerContainer：Optimizer scheduled as a standalone sub process.
FlinkOptimizerContainer: Optimizer is scheduled as a Flink Job through Flink engine.
KubernetesOptimizerContainer: Optimizer is schedule as Kubernetes resources through Kubernetes API.

Kyofin Sep 7, 2023

If the optimizer implementation is completely consistent, we only need to classification based on how the optimizer is scheduled.

Kyofin · 2023-09-07T09:32:56Z

Kyofin
Sep 7, 2023

The optimizer's logic should be inconsistent if it needs to distinguish between Flink and Spark. It should be developed with their respective parallel operators .

0 replies

huyuanfeng2018 · 2023-09-07T11:52:18Z

huyuanfeng2018
Sep 7, 2023
Collaborator

Differentiate according to task mode (My point)

Those that require long-term operation are divided into one category (flink)
The unnecessary ones are divided into one category (Spark),

For these two types of development interface specifications, for example, for spark, I only need to define its yarn or k8s configuration information, while flink requires not only configuration but also management of running status. k8s or yarn is a lower-level abstraction, such as the long-running Optimizer. The interface defines a run and release. It is not necessary for non-long-term running. You can start the task directly through ams scheduling.

interface ResidentOptimizer{
   start();
   release();
}

interface NonResidentOptimizer{
    run();
}

In addition, the deployment method can be divided into

1.k8s

yarn
other
...
The corresponding ones are FlinkYarnResidentOptimizer Flinkk8sResidentOptimizer Sparkk8sNonResidentOptimizer

0 replies

zhongqishang · 2023-09-07T12:34:11Z

zhongqishang
Sep 7, 2023
Collaborator

In my option:

How to support Optimizer on Kubernetes.

For localOptimizer Java process, do we need to support this to production level? I always thought this was a test level support.
For through Flink shell by using the Flink native Kubernetes, I have no problem.

For 2, JM resources usage is trivial to the whole optimizer, flink can bring many capabilities. UI, logs, HA, and executor are executed on different machines...

OptimizerContainer classification issue.

+1 for Based on where the optimizer running.

It is relatively easy to integrate the deployment modes supported by existing engines(Flink, Spark).
For flink, no matter k8s deployment or yarn deployment, only need change the parameter.

0 replies

XBaith · 2023-09-07T15:30:20Z

XBaith
Sep 7, 2023
Collaborator

How to support Optimizer on Kubernetes

As far as I'm concerned, The concept of having AMS manage parallel optimizer pods is indeed commendable! This approach enhances AMS's flexibility as there's no need to maintain additional frameworks. However, the full management of the pod optimizer by AMS does introduce an additional layer of complexity. Therefore, it is crucial to thoroughly consider the implementation of an optimizer resource scheduling system, watchers, fault tolerance measures, and more.

The classification of OptimizerContainer

As @zhoujinsong pointed out, there are two approaches to extending the ResourceContainer: base on HOW and WHERE.
From my perspective, our primary focus should be on HOW the optimizer operates. Regardless of the environment in which the optimizer runs, users only need to decide which type of optimizer to deploy. While we may deploy it in various environments, the fundamental issue of how it operates remains a constant consideration.

0 replies

wangtaohz · 2023-09-08T03:22:45Z

wangtaohz
Sep 8, 2023
Collaborator

The solution for Optimizer on K8s.
I agree that the community should not limit the way how to run optimizer on K8s.

The classification of OptimizerContainer

I tend to extend the ResourceContainer based on HOW the optimizer running, considering two aspects:

For different engine(Flink,Spark), the implementations of the optimizer is quite different, including what @huyuanfeng2018 mentioned, where we might implement a resident optimizer for Flink and a non-resident optimizer for Spark, which would bring interface differences, although I don't think we should make this distinction now.
For the same engine, there is not much difference in running it on k8s or yarn, as the engine is generally well encapsulated on it.

So from the perspective of code organization, extending it according to the engine would make it clearer.

0 replies

baiyangtx · 2023-09-12T12:22:39Z

baiyangtx
Sep 12, 2023
Collaborator Author

Thank you very much for your replies to this discussion. Although I would still like to hear more voices on this topic, in order to advance the work that follows, I will summarize the existing discussion.

Regarding the first question, everyone agreed that the execution mode of Optimizer in K8S environment should not be limited. In version 0.6.0, we will implement both Flink Optimizer and Java Pod Optimizer to run in the K8S environment. We will continue to search for the best practices in the K8S environment.

Regarding the second question, it seems that where and how Optimizer runs are both important for choosing Optimizer Container. At this stage, we should not limit the management of Optimizer Container to a single dimension. In version 0.6.0, we plan to make the following changes to Optimizer Containers:

Rename LocalOptimizerContainer to StandaloneOptimizerContainer.
Implement KubernetesOptimizerContainer to directly manage Optimizer Pods in the K8S environment.
Extend FlinkOptimizerContainer to support deployment in the K8S environment.
Split the modules of different OptimizerContainers and package them separately for each implementation.

If you have any other opinions on this issue, please feel free to continue to leave a comment below. We will continue to discuss this issue.

0 replies

czy006 · 2023-09-13T02:29:19Z

czy006
Sep 13, 2023
Collaborator

Regarding the first topic: How to support Optimizer on Kubernetes.

In my opinion, the first mode is a good choice, because most users of automatic optimization tasks do not need to pay too much attention to the bottom layer of Flink. We provide users with a service perspective and try to shield the underlying optimization logic and related matters; the second method can Supported, but not enabled by default, we can support it. I would like to make an additional point here. If there are some submission interfaces that can be submitted to software such as Streampark to host Flink jobs, it would also be a good choice.

Regarding the second topic: OptimizerContainer classification, I learned about the @zhoujinsong solution. It may be more appropriate to choose the Optimizer operating mode (through Flink or Spark).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion on the integration of Optimizer with K8S and the classification of OptimizerContainer. #1929

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Discussion on the integration of Optimizer with K8S and the classification of OptimizerContainer. #1929

baiyangtx Sep 7, 2023 Collaborator

Replies: 9 comments · 2 replies

baiyangtx Sep 7, 2023 Collaborator Author

zhoujinsong Sep 7, 2023 Collaborator

baiyangtx Sep 7, 2023 Collaborator Author

Kyofin Sep 7, 2023

Kyofin Sep 7, 2023

huyuanfeng2018 Sep 7, 2023 Collaborator

Differentiate according to task mode (My point)

zhongqishang Sep 7, 2023 Collaborator

XBaith Sep 7, 2023 Collaborator

How to support Optimizer on Kubernetes

The classification of OptimizerContainer

wangtaohz Sep 8, 2023 Collaborator

baiyangtx Sep 12, 2023 Collaborator Author

czy006 Sep 13, 2023 Collaborator

baiyangtx
Sep 7, 2023
Collaborator

Replies: 9 comments 2 replies

baiyangtx
Sep 7, 2023
Collaborator Author

zhoujinsong
Sep 7, 2023
Collaborator

baiyangtx Sep 7, 2023
Collaborator Author

Kyofin
Sep 7, 2023

huyuanfeng2018
Sep 7, 2023
Collaborator

zhongqishang
Sep 7, 2023
Collaborator

XBaith
Sep 7, 2023
Collaborator

wangtaohz
Sep 8, 2023
Collaborator

baiyangtx
Sep 12, 2023
Collaborator Author

czy006
Sep 13, 2023
Collaborator