-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust): Fix height validation in hstack_mut
was bypassed when adding to empty frame
#21335
Conversation
is_empty: bool, | ||
) -> PolarsResult<()> { | ||
polars_ensure!( | ||
col.len() == height || is_empty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug is here - || (self.)is_empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems correct for this function, but not correct on the callsite.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #21335 +/- ##
=======================================
Coverage 79.93% 79.94%
=======================================
Files 1596 1597 +1
Lines 228623 228649 +26
Branches 2618 2618
=======================================
+ Hits 182752 182785 +33
+ Misses 45272 45265 -7
Partials 599 599 ☔ View full report in Codecov by Sentry. |
/// It is the callers responsibility to uphold the contract of all `Series` | ||
/// having an equal length, if not this may panic down the line. | ||
pub unsafe fn new_no_length_checks(columns: Vec<Column>) -> PolarsResult<DataFrame> { | ||
ensure_names_unique(&columns, |s| s.name().as_str())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed - new_no_length_checks
doesn't really save any perf since it's already traversing the columns for duplicate name checking - doing a height check on top is practically free.
} | ||
|
||
if columns.len() <= 4 { | ||
// Too small to be worth spawning a hashmap for, this is at most 6 comparisons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied and adjusted from
polars/crates/polars-core/src/frame/mod.rs
Lines 72 to 82 in 4e286d8
if items.len() <= 4 { | |
// Too small to be worth spawning a hashmap for, this is at most 6 comparisons. | |
for i in 0..items.len() - 1 { | |
let name = get_name(&items[i]); | |
for other in items.iter().skip(i + 1) { | |
if name == get_name(other) { | |
polars_bail!(duplicate = name); | |
} | |
} | |
} | |
} else { |
// first loop check validity. We don't do this in a single pass otherwise | ||
// this DataFrame is already modified when an error occurs. | ||
for col in columns { | ||
check_hstack(col, &mut names, height, is_empty)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And by that, I mean here. After the first column is added is_empty
should be false
as it isn't empty anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - we needed to add a column first.
/// Errors if the resulting DataFrame columns have duplicate names or unequal heights. | ||
/// | ||
/// Note: If `self` is empty, `self.height` will always be overridden by the height of the first | ||
/// column in `columns`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we do currently, so if self
is empty we don't check against self.height
even if it is non-zero. I have left a comment for now.
hstack_mut
is bypassed when adding to empty framehstack_mut
was bypassed when adding to empty frame
If
self
was empty, we failed to check that the columns within the slice being added all had equal heights.E.g. this currently passes and creates a DataFrame with invalid state:
This PR adds a
DataFrame::validate_columns_slice
function, and we now take an approach where we perform the append first, and then call the validation function on the resulting slice.There is a chance this PR may start to uncover implementation bugs elsewhere.