You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| spark.rss.data.io.threads | 8 | Amount of thread count for task to push data. |
41
41
| spark.rss.push.data.replicate | true | When true the RSS worker will replicate shuffle data to another RSS worker to ensure shuffle data won't be lost after the node failure. |
| spark.rss.stage.end.timeout | 240s | Time out for StageEnd. |
44
+
| spark.rss.shuffle.writer.mode | hash | RSS support two different shuffle writers. Hash-based shuffle writer works fine when shuffle partition count is normal. Sort-based shuffle writer works fine when memory pressure is high or shuffle partition count it huge. |
42
45
43
46
### RSS Master Configurations
44
47
45
48
| Item | Default | Description |
46
49
| :---: | :---: | :--: |
47
50
| rss.worker.timeout | 120s ||
48
51
| rss.application.timeout | 120s ||
49
-
| rss.stage.end.timeout | 120s ||
50
-
| rss.shuffle.writer.mode | hash | RSS support two different shuffle writers. Hash-based shuffle writer works fine when shuffle partition count is normal. Sort-based shuffle writer works fine when memory pressure is high or shuffle partition count it huge. |
| rss.ha.service.id || When this config is empty, RSS master will refuse to startup. |
58
59
| rss.ha.nodes.{serviceId} || Nodes list that deploy RSS master. ServiceId is `rss.ha.service.id`|
@@ -66,7 +67,7 @@ memory. In conclusion, RSS worker off-heap memory should be set to `(numDirs * q
66
67
67
68
| Item | Default | Description |
68
69
| :---: | :---: | :--: |
69
-
| rss.worker.base.dirs || Directory list to store shuffle data. For the sake of performance, there should be no more than 2 directories on the same disk partition. |
70
+
| rss.worker.base.dirs || Directory list to store shuffle data. For the sake of performance, there should be one directory per HDD and eight per SDD. |
70
71
| rss.worker.flush.buffer.size | 256K ||
71
72
| rss.worker.flush.queue.capacity | 512 | Size of buffer queue attached to each storage directory. Each flush buffer queue consumes `rss.worker.flush.buffer.size` * `rss.worker.flush.queue.capacity`(256K * 512 = 128M) off-heap memory. This config can be used to estimate RSS worker's off-heap memory demands. |
72
73
| rss.worker.fetch.chunk.size | 8m | Max chunk size of reducer's merged shuffle data. For example, if a reducer's shuffle data is 128 M and the data will need 16 fetch chunk requests to fetch. |
@@ -96,7 +97,7 @@ memory. In conclusion, RSS worker off-heap memory should be set to `(numDirs * q
96
97
97
98
Assume we have a cluster described as below:
98
99
5 RSS Workers with 20 GB off-heap memory and 10 disks.
99
-
As we need to reserver 20% off-heap memory for netty, so we could assume 16 GB off-heap memory can be used for flush buffers.
100
+
As we need to reserve 20% off-heap memory for netty, so we could assume 16 GB off-heap memory can be used for flush buffers.
100
101
101
102
If `spark.rss.push.data.buffer.size` is 64 KB, we can have in-flight requests up to 1310720.
102
103
If you have 8192 mapper tasks , you could set `spark.rss.push.data.maxReqsInFlight=160` to gain performance improvements.
@@ -173,7 +174,7 @@ So we should set `rss.worker.flush.queue.capacity=6553` and each RSS worker has
173
174
|`rss.worker.prometheus.metric.port`| 9096 | int ||
and `com.aliyun.emr.rss.common.metrics.source.NetWorkSource`
62
62
63
63
## Grafana Dashboard
64
64
65
-
We provide a grafana dashboard for RSS [Grafana-Dashboard](assets/grafana/rss-dashboard.json). The dashboard was generated by grafana which version is 8.5.0.
65
+
We provide a grafana dashboard for RSS [Grafana-Dashboard](assets/grafana/rss-dashboard.json). The dashboard was generated by grafana of version 8.5.0.
Copy file name to clipboardExpand all lines: README.md
+11-1Lines changed: 11 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -95,7 +95,7 @@ EXAMPLE: single master cluster
95
95
rss.master.address master-host:port
96
96
rss.metrics.system.enabled true
97
97
rss.worker.flush.buffer.size 256k
98
-
rss.worker.flush.queue.capacity 512
98
+
rss.worker.flush.queue.capacity 4096
99
99
rss.worker.base.dirs /mnt/disk1/,/mnt/disk2
100
100
# If your hosts have disk raid or use lvm, set rss.device.monitor.enabled to false
101
101
rss.device.monitor.enabled false
@@ -217,3 +217,13 @@ RSS have various metrics. [METRICS](METRICS.md)
217
217
## Contribution
218
218
This is an active open-source project. We are always open to developers who want to use the system or contribute to it.
219
219
See more detail in [Contributing](CONTRIBUTING.md).
220
+
221
+
## NOTICE
222
+
If you need to fully restart an RSS cluster in HA mode, you must clean ratis meta storage first because ratis meta will store expired states of the last running cluster.
223
+
224
+
Here are some instructions:
225
+
1. Stop all workers.
226
+
2. Stop all masters.
227
+
3. Clean all master`s ratis meta storage directory(rss.ha.storage.dir).
0 commit comments