Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase checksum buffer to 128kb, improving download performance. #295

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stewartsmith
Copy link
Contributor

Reading 2kb at a time to compute the checksum limits network throughput. Bumping up to 128kb seems to give a good balance of memory usage and performance.

Benchmarks done on a m5n.16xlarge EC2 instance doing a reposync on the Amazon Linux 2023 x86-64 repositories showed that this change, when combined with the (smaller) benefits of my avoiding libc IO patch, reduce system CPU time by another half second, and cut a further 3 seconds off total time:

102s (original) -> 99 (no libc buffered io) -> 95s (this patch)

Reading 2kb at a time to compute the checksum limits network throughput.
Bumping up to 128kb seems to give a good balance of memory usage and
performance.

Benchmarks done on a m5n.16xlarge EC2 instance doing a reposync on the
Amazon Linux 2023 x86-64 repositories showed that this change, when
combined with the (smaller) benefits of my avoiding libc IO patch,
reduce system CPU time by another half second, and cut a further 3
seconds off total time:

102s (original) -> 99 (no libc buffered io) -> 95s (this patch)
@stewartsmith
Copy link
Contributor Author

stewartsmith commented Feb 9, 2024

For reference, my benchmarking has been done on a m5n.16xlarge EC2 instance to the in-region S3 buckets as well as to the CDN repositories. That instance type has 256GB memory, a 75Gbit network connection, and is a 64 core Cascade Lake system. The root volume is a 256GB gp3 EBS volume with 500MB/sec of IO and 3000 IOPs.

The background of this is that a lot of EC2 instances don't live that long (relatively speaking), and never install RPMs except on launch - so all the time-to-install RPMs is time spent scaling up a system that could be better served by running the customer workload.

Goes well when paired with #294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant