Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Fix height validation in hstack_mut was bypassed when adding to empty frame #21335

Merged
merged 25 commits into from
Feb 20, 2025

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Feb 19, 2025

If self was empty, we failed to check that the columns within the slice being added all had equal heights.

E.g. this currently passes and creates a DataFrame with invalid state:

let mut df = DataFrame::empty();
let result = df.hstack_mut(&[
    Column::full_null("a".into(), 1, &DataType::Null),
    Column::full_null("b".into(), 3, &DataType::Null),
]);

This PR adds a DataFrame::validate_columns_slice function, and we now take an approach where we perform the append first, and then call the validation function on the resulting slice.

There is a chance this PR may start to uncover implementation bugs elsewhere.

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Feb 19, 2025
is_empty: bool,
) -> PolarsResult<()> {
polars_ensure!(
col.len() == height || is_empty,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is here - || (self.)is_empty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems correct for this function, but not correct on the callsite.

Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 86.44068% with 16 lines in your changes missing coverage. Please review.

Project coverage is 79.94%. Comparing base (53f0bfd) to head (becfb02).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-core/src/frame/validation.rs 87.23% 6 Missing ⚠️
crates/polars-schema/src/schema.rs 50.00% 5 Missing ⚠️
crates/polars-core/src/frame/horizontal.rs 92.30% 4 Missing ⚠️
crates/polars-core/src/frame/mod.rs 85.71% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #21335   +/-   ##
=======================================
  Coverage   79.93%   79.94%           
=======================================
  Files        1596     1597    +1     
  Lines      228623   228649   +26     
  Branches     2618     2618           
=======================================
+ Hits       182752   182785   +33     
+ Misses      45272    45265    -7     
  Partials      599      599           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nameexhaustion nameexhaustion marked this pull request as ready for review February 19, 2025 11:56
@nameexhaustion nameexhaustion marked this pull request as draft February 19, 2025 11:56
/// It is the callers responsibility to uphold the contract of all `Series`
/// having an equal length, if not this may panic down the line.
pub unsafe fn new_no_length_checks(columns: Vec<Column>) -> PolarsResult<DataFrame> {
ensure_names_unique(&columns, |s| s.name().as_str())?;
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed - new_no_length_checks doesn't really save any perf since it's already traversing the columns for duplicate name checking - doing a height check on top is practically free.

}

if columns.len() <= 4 {
// Too small to be worth spawning a hashmap for, this is at most 6 comparisons.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied and adjusted from

if items.len() <= 4 {
// Too small to be worth spawning a hashmap for, this is at most 6 comparisons.
for i in 0..items.len() - 1 {
let name = get_name(&items[i]);
for other in items.iter().skip(i + 1) {
if name == get_name(other) {
polars_bail!(duplicate = name);
}
}
}
} else {

// first loop check validity. We don't do this in a single pass otherwise
// this DataFrame is already modified when an error occurs.
for col in columns {
check_hstack(col, &mut names, height, is_empty)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by that, I mean here. After the first column is added is_empty should be false as it isn't empty anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - we needed to add a column first.

@nameexhaustion nameexhaustion marked this pull request as draft February 20, 2025 08:27
/// Errors if the resulting DataFrame columns have duplicate names or unequal heights.
///
/// Note: If `self` is empty, `self.height` will always be overridden by the height of the first
/// column in `columns`.
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we do currently, so if self is empty we don't check against self.height even if it is non-zero. I have left a comment for now.

@nameexhaustion nameexhaustion changed the title fix(rust): Height validation in hstack_mut is bypassed when adding to empty frame fix(rust): Fix height validation in hstack_mut was bypassed when adding to empty frame Feb 20, 2025
@nameexhaustion nameexhaustion marked this pull request as ready for review February 20, 2025 10:01
@ritchie46 ritchie46 merged commit 9164d2c into pola-rs:main Feb 20, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants