-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix(rust): Increase precision when constructing float Series
#25323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(rust): Increase precision when constructing float Series
#25323
Conversation
Constructing a new Series with an unspecified datatype and string supertype led to a naive translation of passed floats to strings that was constrained by `polars.Config(float_precision=…)` Float64 & Float32 are now handled in the same fashion as `PySeries.cast` when constructing new Series.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25323 +/- ##
==========================================
- Coverage 82.05% 82.04% -0.01%
==========================================
Files 1730 1730
Lines 241110 241180 +70
Branches 3032 3032
==========================================
+ Hits 197841 197884 +43
- Misses 42487 42514 +27
Partials 782 782 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I assume that the drop in coverage is likely due to never internally casting to |
float Series
| AnyValue::Null => builder.append_null(), | ||
| AnyValue::Binary(_) | AnyValue::BinaryOwned(_) => builder.append_null(), | ||
| AnyValue::Float64(f) => { | ||
| let mut tmp = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you re-use the buffer like let mut owned is at the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be resolved now, let me know if the implementation looks correct- I'm still pretty new to Rust.
No don't worry about it. |
orlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation as-is is correct for pure floats, but I think it still goes wrong for nested types which recursively contain floats (like structs and lists).
Does this cover the cases you expressed concern about? I just parameterized with a few more cases to test against which all passed locally with out any further source changes. |
|
@camriddell No that doesn't satisfy my concerns because the test case you added doesn't test the mixed string/struct case. This still loses precision: >>> pl.Series("name", ["a string", {"a": 0.123456789012345}], strict=False, dtype=pl.String)
shape: (2,)
Series: 'name' [str]
[
"a string"
"{0.123457}"
]There needs to be a dedicated path for |
Thank you for the clarification and MRE, I'll look into this to recurse through nested data structures. |
|
Hi @orlp I think I have a working solution here. I did a bit of refactoring to keep recursive allocations minimal and added some more tests to cover the cases you highlighted. The nested data structure formatting code is based on the code in |
…alue-float-upcast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're almost there. Can you add a comment explaining why we do this and not just go through write?
And we very recently added the Float16 type, so I think that also needs a branch here. Sorry to move the goalpost :)
Constructing a new Series with an unspecified datatype/string supertype led to a naive translation of passed floats to strings that was constrained by
polars.Config(float_precision=…)Float64 & Float32 are now handled in the same fashion as
PySeries.castwhen constructing new Series to preserve as much precision as possible for this style of representation.Resolves #25257