Skip to content

Conversation

mattsu2020
Copy link
Contributor

Add automatic buffer-size heuristics (ported from commit a0e77d9). We now size external-sort chunks based on input file sizes and available memory, clamping to 512 KiB–128 MiB so we avoid both tiny buffers and risky overcommit on constrained systems.

Respect user-provided --buffer-size. Only automatically computed sizes are raised to the safety minimum; explicit values are left untouched, which keeps external sorting and --compress-program working even when users choose small buffers.

Performance Comparison (baseline vs. current)
Measurements come from hyperfine --warmup 3 --runs 10; values are means in milliseconds (lower is better).

Scenario Baseline Current Delta Speedup
ASCII 500k 17.04 14.59 -2.45 1.17×
Numeric 500k 36.89 37.29 +0.40 0.99×
ASCII 4M 112.47 107.52 -4.95 1.05×
ASCII 4M (-S 32M) 207.40 100.48 -106.92 2.06×
ASCII 16M 854.18 414.45 -439.73 2.06×
ASCII 16M (-S 512M) 838.05 426.85 -411.20 1.96×

Copy link

github-actions bot commented Oct 6, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

github-actions bot commented Oct 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

you have to refresh the fuzz/Cargo.lock file

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

please follow this documentation for the performance work:
https://github.com/uutils/coreutils/blob/main/docs/src/performance.md

We would like to see hyperfine results and codspeed benchmark :)
thanks

.filter(|s| matches!(s.settings.mode, SortMode::GeneralNumeric))
.count();

self.precomputed.fast_lexicographic = self.mode == SortMode::Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs comments, and moved into a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let file_hint = file_size_hint(files);
let mem_hint = available_memory_hint();

match (file_hint, mem_hint) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cases need comments here
it is a bit cryptic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now pattern-matches file_size_hint and available_memory_hint, selecting the smaller of the two when both are present. If only one hint is available, it uses that; when neither is available, it falls back to FALLBACK_AUTOMATIC_BUF_SIZE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment in the code, not here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

quarter.max(max)
}

#[cfg(target_os = "linux")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was already not convinced in #8802
but
there is probably a better way than parsing /proc/meminfo

esp in the sort.rs code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented it using a different method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said it in the other pr but it does not belong to sort but uucore
And maybe we already have such functions

Copy link
Contributor Author

@mattsu2020 mattsu2020 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?

}
}

fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that necessary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/uu/sort/src/sort.rs:1931 adds ascii_case_insensitive_cmp in commit da5bd03 (2025-10-06 “fix”) so the new fast_ascii_insensitive path in compare_by can bypass the heavier general comparator and deliver a cheap ASCII fold when sort -f-style settings allow it.

src/uu/sort/src/sort.rs:1934 is tweaked one commit later (5fe2b48, also “fix”) to use to_ascii_lowercase, relying on the standard helper for clarity and correct handling of non-uppercase bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what your Ai wrote here, sorry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcasecmp is affected by locale settings, so we implemented it separately.
Should we use strcasecmp?

path: Option<PathBuf>,
}

fn handler_state() -> Arc<Mutex<HandlerRegistration>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit b3b721f makes the temporary-directory Ctrl+C handler reusable by storing the global handler state in OnceLock/AtomicBool and cleaning it up in Drop.

Commit 5fe2b48 removes an unnecessary .into() because USimpleError::new already returns the boxed error type.

.clone()
}

fn ensure_signal_handler_installed(state: Arc<Mutex<HandlerRegistration>>) -> UResult<()> {
Copy link
Contributor

@sylvestre sylvestre Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, it needs some explanations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensures the SIGINT handler for sort is registered only once via an AtomicBool, shares the current temp-dir lock/path through a global HandlerRegistration so the handler can safely delete the directory and exit, and resets the flag plus surfaces a localized error if handler setup fails; commits b3b721f (introducing this stateful handler) and 5fe2b48 (cleanup of the returned error) capture the change.

Copy link

codspeed-hq bot commented Oct 11, 2025

CodSpeed Performance Report

Merging #8833 will degrade performances by 2.69%

Comparing mattsu2020:sort_performan-ce (92a3869) with main (8d59e08)1

Summary

⚡ 16 improvements
❌ 1 regression
✅ 89 untouched
⏩ 73 skipped2

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
sort_accented_data[500000] 819.3 ms 354.4 ms ×2.3
sort_ascii_only[500000] 798 ms 345.6 ms ×2.3
sort_case_insensitive[500000] 417.1 ms 271.9 ms +53.4%
sort_case_sensitive[500000] 327.1 ms 163 ms ×2
sort_key_field[500000] 688.8 ms 707.9 ms -2.69%
sort_mixed_data[500000] 729.6 ms 317.7 ms ×2.3
sort_reverse_locale[500000] 820.7 ms 351.7 ms ×2.3
sort_unique_locale[500000] 1,142.3 ms 473.1 ms ×2.4
sort_ascii_c_locale 27.9 ms 21.3 ms +30.76%
sort_ascii_utf8_locale 55.8 ms 42.7 ms +30.68%
sort_german_c_locale 94.5 ms 38 ms ×2.5
sort_german_locale 94.5 ms 38 ms ×2.5
sort_mixed_c_locale 94.1 ms 37.9 ms ×2.5
sort_mixed_utf8_locale 94.1 ms 37.9 ms ×2.5
sort_random_strings 55.3 ms 30.9 ms +79.07%
sort_reverse_mixed 93.2 ms 37.3 ms ×2.5
sort_unique_mixed 87.6 ms 38.2 ms ×2.3

Footnotes

  1. No successful run was found on main (795d587) during the generation of this report, so 8d59e08 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 73 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@sylvestre
Copy link
Contributor

impressive wins :)

Replaced fixed 1GB buffer size with dynamic calculation based on input file sizes and available memory. Added heuristics to clamp buffer size within 512 KiB to 128 MiB range, preventing overcommitment on constrained systems while optimizing for typical workloads. Updated dependencies and imports to use libc directly for better portability.
- Add sysconf to codespell ignore list to prevent false positives
- Remove "NOTE:" from buffer heuristics comment for cleaner style
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Modified conditional compilation in physical_memory_bytes() to skip Unix-specific sysconf calls on Redox OS, ensuring compatibility by returning None for unsupported platforms.
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@mattsu2020 mattsu2020 requested a review from sylvestre October 13, 2025 01:15
@sylvestre
Copy link
Contributor

please try to avoid commit messages like "fix"
it doesn't help reviewers

The getrlimit function call in the sort module was missing the libc prefix, which could lead to compilation errors. Added the libc:: qualifier to ensure proper function resolution and maintain code clarity.
use fnv::FnvHasher;
#[cfg(target_os = "linux")]
use nix::libc::{RLIMIT_NOFILE, getrlimit, rlimit};
use libc::{RLIMIT_NOFILE, rlimit};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was more here. why do you move to libc ?
what was wrong with nix ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I was using libc elsewhere, I thought it would be better to match that, but the original code is better.

Switch from direct libc import to nix::libc for the getrlimit function to leverage the nix crate's safer Unix API abstractions, improving portability and maintainability.
Removed the unused `getrlimit` function from the nix libc imports in sort.rs to clean up the code and eliminate dead imports.
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link
Contributor Author

@mattsu2020 mattsu2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

・Add comment
・Change to Nix

path: Option<PathBuf>,
}

fn handler_state() -> Arc<Mutex<HandlerRegistration>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit b3b721f makes the temporary-directory Ctrl+C handler reusable by storing the global handler state in OnceLock/AtomicBool and cleaning it up in Drop.

Commit 5fe2b48 removes an unnecessary .into() because USimpleError::new already returns the boxed error type.

}
}

fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/uu/sort/src/sort.rs:1931 adds ascii_case_insensitive_cmp in commit da5bd03 (2025-10-06 “fix”) so the new fast_ascii_insensitive path in compare_by can bypass the heavier general comparator and deliver a cheap ASCII fold when sort -f-style settings allow it.

src/uu/sort/src/sort.rs:1934 is tweaked one commit later (5fe2b48, also “fix”) to use to_ascii_lowercase, relying on the standard helper for clarity and correct handling of non-uppercase bytes.

.clone()
}

fn ensure_signal_handler_installed(state: Arc<Mutex<HandlerRegistration>>) -> UResult<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensures the SIGINT handler for sort is registered only once via an AtomicBool, shares the current temp-dir lock/path through a global HandlerRegistration so the handler can safely delete the directory and exit, and resets the flag plus surfaces a localized error if handler setup fails; commits b3b721f (introducing this stateful handler) and 5fe2b48 (cleanup of the returned error) capture the change.

let file_hint = file_size_hint(files);
let mem_hint = available_memory_hint();

match (file_hint, mem_hint) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now pattern-matches file_size_hint and available_memory_hint, selecting the smaller of the two when both are present. If only one hint is available, it uses that; when neither is available, it falls back to FALLBACK_AUTOMATIC_BUF_SIZE.

quarter.max(max)
}

#[cfg(target_os = "linux")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented it using a different method

use fnv::FnvHasher;
#[cfg(target_os = "linux")]
use nix::libc::{RLIMIT_NOFILE, getrlimit, rlimit};
use libc::{RLIMIT_NOFILE, rlimit};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I was using libc elsewhere, I thought it would be better to match that, but the original code is better.

@sylvestre
Copy link
Contributor

I have doubt about the memory functions being in sort itself. Should be in uucore if we already don't have any.

Also please keep in mind that you are writing to a human. If I want an Ai explaining me things, I can do it myself :)

…cations

This change modifies the buffer size handling in the external sort functionality to halve requests exceeding 512MiB, reducing memory usage and avoiding potential allocation issues.
Removed the `nix::libc::` prefix from the `getrlimit` function call in `get_rlimit()`, likely to use a direct import from `libc` for cleaner code and reduced dependency specificity.
Added a comment and updated the match logic in `automatic_buffer_size` to prefer the minimum of file and memory hints when both are available, ensuring a more conservative buffer size estimate for better performance and resource management.
Copy link
Contributor Author

@mattsu2020 mattsu2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to the source code.

let file_hint = file_size_hint(files);
let mem_hint = available_memory_hint();

match (file_hint, mem_hint) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

quarter.max(max)
}

#[cfg(target_os = "linux")]
Copy link
Contributor Author

@mattsu2020 mattsu2020 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?

}
}

fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcasecmp is affected by locale settings, so we implemented it separately.
Should we use strcasecmp?

mattsu2020 and others added 3 commits October 13, 2025 20:17
Added `getrlimit` to the nix::libc imports in sort.rs to enable checking resource limits, likely for handling file descriptor limits on Linux systems.
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

mattsu2020 and others added 2 commits October 14, 2025 14:36
- Moved `available_memory_bytes` function from `src/uu/sort/src/sort.rs` to `src/uucore/src/lib/features/parser/parse_size.rs` to centralize memory parsing logic in the core library.
- Updated the function to read `/proc/meminfo` more robustly using a new helper `parse_meminfo_line`, improving accuracy and modularity for better code reuse across utilities.
@mattsu2020
Copy link
Contributor Author

move memory functions to uucore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants