Need help in understanding of limit configuration for memberlist replication setup #5285

gs-mteki · 2023-06-19T15:05:38Z

gs-mteki
Jun 19, 2023

Hi All,

I configured the mimir to run on EC2 machine in monolith mode and made high available via AWS Auto scaling group (ASG) using memberlist replicas. So when AWS ASG observes high network inbound and outbound data transfer it’s auto scaling. Everything is working as expected till we hit the limitation errors like max_label_names_per_series, max_global_series_per_metric, max_global_series_per_user, ingestion_rate.

I modified the values based on error logs and the values are as follows

  max_label_names_per_series: 80
  max_global_series_per_metric: 80000
  max_global_series_per_user: 4000000
  ingestion_rate: 150000

Now I am seeing the similar limitation error when we reached the above limit

user=anonymous: per-metric series limit of 80000 exceeded, please contact administrator to raise it (per-ingester local limit: 30000 )

After debugging, my understanding of config.yaml file is wrong. Initially I thought the values in config.yaml file are applicable to individual mimirs but I am wrong because the values in config.yaml are applicable to entire replicas.

If I have 3 mimir replicas then values are divided by 3. For example max_global_series_per_metric -- 80000/3 so each mimir will accept upto 26666 merics per series.

The main problem here is I am using AWS ASG and I am not sure how many instance will be there at any point of time. So would like to take help on below points

Is there any configuration that I am missing to make config.yaml are only applicable to individual mimirs ?
Also I am not sure my entire mimir setup is correct or not
I also wanted to know how I can achieve Auto scaling in mimir

You suggestion/help is really appreciated. Also let me know if you need any information

Answered by dimitarvdimitrov

Jun 22, 2023

You can set the limits very high or optionally disable the limits altogether. These limits protect the Mimir cluster to not be overwhelmed by the ingress of data or such that one tenant doesn't occupy the majority of the cluster capacity. It seems like your Mimir cluster scales according to load, so it might be safe to just increase them a lot.

Related to this - there are also per-replica limits, called instance limits. Those don't change depending on the size of the cluster. So you can disable per-tenant limits and only set instance limits.

distributor:
	instance_limits:
	  # (advanced) Max ingestion rate (samples/sec) that this distributor will
	  # accept. This limit is per-distributor…

View full answer

dimitarvdimitrov · 2023-06-22T10:16:25Z

dimitarvdimitrov
Jun 22, 2023
Maintainer

You can set the limits very high or optionally disable the limits altogether. These limits protect the Mimir cluster to not be overwhelmed by the ingress of data or such that one tenant doesn't occupy the majority of the cluster capacity. It seems like your Mimir cluster scales according to load, so it might be safe to just increase them a lot.

Related to this - there are also per-replica limits, called instance limits. Those don't change depending on the size of the cluster. So you can disable per-tenant limits and only set instance limits.

distributor:
	instance_limits:
	  # (advanced) Max ingestion rate (samples/sec) that this distributor will
	  # accept. This limit is per-distributor, not per-tenant. Additional push
	  # requests will be rejected. Current ingestion rate is computed as
	  # exponentially weighted moving average, updated every second. 0 = unlimited.
	  # CLI flag: -distributor.instance-limits.max-ingestion-rate
	  [max_ingestion_rate: <float> | default = 0]
	
	  # (advanced) Max inflight push requests that this distributor can handle. This
	  # limit is per-distributor, not per-tenant. Additional requests will be
	  # rejected. 0 = unlimited.
	  # CLI flag: -distributor.instance-limits.max-inflight-push-requests
	  [max_inflight_push_requests: <int> | default = 2000]
	
	  # (advanced) The sum of the request sizes in bytes of inflight push requests
	  # that this distributor can handle. This limit is per-distributor, not
	  # per-tenant. Additional requests will be rejected. 0 = unlimited.
	  # CLI flag: -distributor.instance-limits.max-inflight-push-requests-bytes
	  [max_inflight_push_requests_bytes: <int> | default = 0]

ingester:
	instance_limits:
	  # (advanced) Max ingestion rate (samples/sec) that ingester will accept. This
	  # limit is per-ingester, not per-tenant. Additional push requests will be
	  # rejected. Current ingestion rate is computed as exponentially weighted
	  # moving average, updated every second. 0 = unlimited.
	  # CLI flag: -ingester.instance-limits.max-ingestion-rate
	  [max_ingestion_rate: <float> | default = 0]
	
	  # (advanced) Max tenants that this ingester can hold. Requests from additional
	  # tenants will be rejected. 0 = unlimited.
	  # CLI flag: -ingester.instance-limits.max-tenants
	  [max_tenants: <int> | default = 0]
	
	  # (advanced) Max series that this ingester can hold (across all tenants).
	  # Requests to create additional series will be rejected. 0 = unlimited.
	  # CLI flag: -ingester.instance-limits.max-series
	  [max_series: <int> | default = 0]
	
	  # (advanced) Max inflight push requests that this ingester can handle (across
	  # all tenants). Additional requests will be rejected. 0 = unlimited.
	  # CLI flag: -ingester.instance-limits.max-inflight-push-requests
	  [max_inflight_push_requests: <int> | default = 30000]

7 replies

gs-mteki Mar 6, 2024
Author

Hi @sid-jar, Sure give me sometime. Will share the implementation

sidkram Mar 10, 2024

Heyy @gs-mteki , can you help me with the implementation. I've been facing frequent cluster down failures at high load. I believe this could be mitigated with an autoscaling feature like you've mentioned into play

gs-mteki Mar 11, 2024
Author

@sid-jar,

Please check the architecture diagram.

I am using pure Linux machine which doesn't have any pre-installed tools. As part of init-script I am installing the mimir. Also I am storing the config.yaml file in S3

!/bin/bash

echo "----------- Creating mimir user with no login access -----------"
sudo useradd -rs /bin/false mimir


echo "----------- Creating mimir directories -----------"
sudo mkdir /data/mimir/
sudo chown -R ec2-user:ec2-user /data/mimir/


echo "----------- Downloading mimir -----------"
cd /tmp/
MIMIR_VERSION="2.1.0"
wget https://github.com/grafana/mimir/releases/download/mimir-${MIMIR_VERSION}/mimir-linux-amd64 -O mimir

sudo mv mimir /usr/local/bin/
sudo chmod +x /usr/local/bin/mimir


echo "----------- Downloading mimir config -----------"
aws s3 cp <S3_LOCATION> /data/mimir/config.yaml
chown mimir:mimir /data/mimir/config.yaml


echo "----------- Creating mimir service file -----------"
MIMIR_SERVICE_FILE_PATH="/etc/systemd/system/mimir.service"
sudo touch "$MIMIR_SERVICE_FILE_PATH"

cat <<EOT > "$MIMIR_SERVICE_FILE_PATH"
[Unit]
Description=mimir service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/mimir -memberlist.join <NLB_URL>:7946 --config.file /data/mimir/config.yaml

[Install]
WantedBy=multi-user.target
EOT


echo "----------- Starting mimir service -----------"
sudo systemctl daemon-reload
sudo systemctl enable mimir.service
sudo systemctl start mimir.service

IMP Note:
Try to run the setup in single available zone. As it occurs Data transfer cost.

I am not sure how much you will get from the above content. But feel free to ask question if you have any doubts.

sidkram Mar 11, 2024

@gs-mteki Firstly, thanks for the extensive info on the setup. I still have a few doubts on the architecture.

How do we set the memberlist ring, the config has a memberlist directive that uses memberlist.join_members where you get to add the instance addresses of the other members of the cluster. How does this get configured. (Do you only add the NLB ip over here ? )
I do see ports 7946, 8080 and 9095 being opened, where 7946 are used for the memberlist bind ports. What are the other two bound to?? Mimir uses 9009 and 9010 is used for the ingester right.. aren't we supposed to use that?

gs-mteki Mar 11, 2024
Author

Answer to First question

I am not able to see memberlist.join_members option in documentation. But I am using -memberlist.join for connecting all the mimir instances. You can create CNAME for NLB and add that CNAME to -memberlist.join

Answer to Second question

Port 8080: It is mimir port we are using 8080 instead of 9009
Port 9095: It is used for gRPC communication.

In this architecture I am running all the mimir instance in monolithic mode and communicating each server via membership gossip protocol.

Hope you got answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help in understanding of limit configuration for memberlist replication setup #5285

{{title}}

Replies: 1 comment 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Need help in understanding of limit configuration for memberlist replication setup #5285

gs-mteki Jun 19, 2023

Replies: 1 comment · 7 replies

dimitarvdimitrov Jun 22, 2023 Maintainer

gs-mteki Mar 6, 2024 Author

sidkram Mar 10, 2024

gs-mteki Mar 11, 2024 Author

sidkram Mar 11, 2024

gs-mteki Mar 11, 2024 Author

gs-mteki
Jun 19, 2023

Replies: 1 comment 7 replies

dimitarvdimitrov
Jun 22, 2023
Maintainer

gs-mteki Mar 6, 2024
Author

gs-mteki Mar 11, 2024
Author

gs-mteki Mar 11, 2024
Author