We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 Flink:
Flink QuickStart 写的不清楚,跟着执行都走不下去 https://github.com/InterestingLab/waterdrop-docs/blob/master/zh-cn/v2/flink/quick-start.md 比如:wd下载地址仅指向了一个固定版本是个问题;已经启动了fake,为什么还要执行nc ?,webui去哪里看,是否需要提前启动好flink集群(应该分为本地试用/提交到flink standalone/提交到yarn)?flink看起来不能以local模式的方式启动(Flink 叫Local Setup,使用方式可以直接贴链接:https://ci.apache.org/projects/flink/flink-docs-stable/getting-started/tutorials/local_setup.html)。
WD Flink 的Deploy方式,例如如何提交到yarn上?
Source 文档,format 改为 format.type,增加了avro格式。
format
format.type
bin/start-waterdrop-flink.sh 可以兼容/bin/fink 的所有参数吗?start-waterdrop-flink.sh 的时候,应该把官方链接也贴出来:https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
新增章节:Waterdrop Flink Job的管理(提交、启动、停止),可参照官方文档:https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
插件开发文档
flink 配置中,缺少一个像spark那样的完整配置案例。
完整配置案例
Flink 统一增加关于Format,数据源格式的介绍
The text was updated successfully, but these errors were encountered:
在waterdrop对于es的读取中(es作为input插件)[https://interestinglab.github.io/waterdrop-docs/#/zh-cn/v1/configuration/input-plugins/Elasticsearch],加上对es configuration参数的说明:[https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/configuration.html],用以最大化waterdrop读取es的效率,该参数为:es.input.max.docs.per.partition的值如下:分区数 = 总数据条数/es.input.max.docs.per.partition。让用户选择合适的分区数,用以处理可能数据量较大时带来的shuffle过长使得es与其他组件迁移速率低下的问题。也是通过控制读取es的分区数来加快shuffle的过程,添加说明可以使得用户更加方便高效。这一点与es的官网对于读取es的分片所应用的cpu核数(线程数)的建议有点不一样,需要实践才能知道合适的大小。并且通过测试这个参数设置的合理,读取es的效率可以提升3-10倍。
Sorry, something went wrong.
V1 Spark:
根据FAQ补充文档。
补充其他文档:ClickHouse duplication due to Spark retry mechanism apache/seatunnel#414
split filter的delimeter参数是regex
filter { split { source_field = "raw_message" delimiter = "\\|" fields = ["field1", "field2"] } }
No branches or pull requests
V2 Flink:
Flink QuickStart 写的不清楚,跟着执行都走不下去
https://github.com/InterestingLab/waterdrop-docs/blob/master/zh-cn/v2/flink/quick-start.md
比如:wd下载地址仅指向了一个固定版本是个问题;已经启动了fake,为什么还要执行nc ?,webui去哪里看,是否需要提前启动好flink集群(应该分为本地试用/提交到flink standalone/提交到yarn)?flink看起来不能以local模式的方式启动(Flink 叫Local Setup,使用方式可以直接贴链接:https://ci.apache.org/projects/flink/flink-docs-stable/getting-started/tutorials/local_setup.html)。
WD Flink 的Deploy方式,例如如何提交到yarn上?
Source 文档,
format
改为format.type
,增加了avro格式。bin/start-waterdrop-flink.sh 可以兼容/bin/fink 的所有参数吗?start-waterdrop-flink.sh 的时候,应该把官方链接也贴出来:https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
新增章节:Waterdrop Flink Job的管理(提交、启动、停止),可参照官方文档:https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
插件开发文档
flink 配置中,缺少一个像spark那样的
完整配置案例
。Flink 统一增加关于Format,数据源格式的介绍
The text was updated successfully, but these errors were encountered: