Documentation Enhancement #7

garyelephant · 2020-07-01T05:05:32Z

V2 Flink:

Flink QuickStart 写的不清楚，跟着执行都走不下去
https://github.com/InterestingLab/waterdrop-docs/blob/master/zh-cn/v2/flink/quick-start.md
比如：wd下载地址仅指向了一个固定版本是个问题；已经启动了fake，为什么还要执行nc ？，webui去哪里看，是否需要提前启动好flink集群（应该分为本地试用/提交到flink standalone/提交到yarn）？flink看起来不能以local模式的方式启动(Flink 叫Local Setup，使用方式可以直接贴链接：https://ci.apache.org/projects/flink/flink-docs-stable/getting-started/tutorials/local_setup.html)。
WD Flink 的Deploy方式，例如如何提交到yarn上？
Source 文档，format 改为 format.type，增加了avro格式。
bin/start-waterdrop-flink.sh 可以兼容/bin/fink 的所有参数吗？start-waterdrop-flink.sh 的时候，应该把官方链接也贴出来：https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
新增章节：Waterdrop Flink Job的管理（提交、启动、停止），可参照官方文档：https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
插件开发文档
flink 配置中，缺少一个像spark那样的完整配置案例。
Flink 统一增加关于Format，数据源格式的介绍

The text was updated successfully, but these errors were encountered:

pdlovedy · 2020-07-15T03:52:31Z

在waterdrop对于es的读取中（es作为input插件）[https://interestinglab.github.io/waterdrop-docs/#/zh-cn/v1/configuration/input-plugins/Elasticsearch]，加上对es configuration参数的说明:[https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/configuration.html]，用以最大化waterdrop读取es的效率，该参数为：es.input.max.docs.per.partition的值如下：分区数 = 总数据条数/es.input.max.docs.per.partition。让用户选择合适的分区数，用以处理可能数据量较大时带来的shuffle过长使得es与其他组件迁移速率低下的问题。也是通过控制读取es的分区数来加快shuffle的过程，添加说明可以使得用户更加方便高效。这一点与es的官网对于读取es的分片所应用的cpu核数(线程数)的建议有点不一样，需要实践才能知道合适的大小。并且通过测试这个参数设置的合理，读取es的效率可以提升3-10倍。

garyelephant · 2020-10-26T07:32:03Z

V1 Spark:

根据FAQ补充文档。
补充其他文档：ClickHouse duplication due to Spark retry mechanism apache/seatunnel#414
split filter的delimeter参数是regex

filter {
    split {
        source_field = "raw_message"
        delimiter = "\\|"
        fields = ["field1", "field2"]
    }
}

案例中，应该用三引号

garyelephant added the documentation Improvements or additions to documentation label Jul 1, 2020

garyelephant changed the title ~~Documentation Problems~~ Documentation Enhancement Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation Enhancement #7

Documentation Enhancement #7

garyelephant commented Jul 1, 2020 •

edited

Loading

pdlovedy commented Jul 15, 2020

garyelephant commented Oct 26, 2020 •

edited

Loading

Documentation Enhancement #7

Documentation Enhancement #7

Comments

garyelephant commented Jul 1, 2020 • edited Loading

pdlovedy commented Jul 15, 2020

garyelephant commented Oct 26, 2020 • edited Loading

garyelephant commented Jul 1, 2020 •

edited

Loading

garyelephant commented Oct 26, 2020 •

edited

Loading