-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Implement primitive type access for null/time/decimal* #8638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1d7e5e8 to
b5bfaed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let value = boolean_array.value(index); | ||
| Variant::from(value) | ||
| } | ||
| DataType::Date32 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been adjusted to be grouped together with the Timestamp/Time.
| fn append_value(&mut self, v: ()); | ||
| } | ||
|
|
||
| impl AppendValueTrait for NullBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is needed by the define_variant_to_primitive_builder macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NullArray is kind of strange -- maybe rather than creating a full on builder for it, we could just keep count and then create the output array with https://docs.rs/arrow/latest/arrow/array/fn.new_null_array.html ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullArray (with DataType::Null) is not the same as an array (of some arbitrary other type) full of nulls tho? But if we know the intended type, then I agree that new_null_array is better (since NullArray::is_null always returns false = weird)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now that I'm actually looking at the code (and not just reacting to the comments) --
It seems like we don't actually need a NullBuilder at all? We know the capacity, so just use NullArray::new:
struct FakeNullBuilder(NullArray);
impl FakeNullBuilder {
fn new(capacity: usize) -> Self {
Self(NullArray::new(capacity)
}
fn append_value(&mut self, _: ()) {}
fn append_null(&mut self) {}
fn finish(self) -> NullArray {
self.0
}
}... and then the macro invocation us just:
|capacity| -> FakeNullBuilder { FakeNullBuilder::new(capacity) },There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullArray (with DataType::Null) is not the same as an array (of some arbitrary other type) full of nulls tho?
new_null_arraywith different types are different, the following unit test will fail
let first = new_null_array(&DataType::Int32, 3);
let second = new_null_array(&DataType::Timestamp(TimeUnit::Nanosecond, None), 3);
assert_eq!(*first, second.clone());
Use NullArray instead of new_null_array in the changed code because the compiler will panic with the trait bound Result<Arc<dyn Array>, ArrowError>: Array is not satisfied if I use new_null_array in FakeNullBuilder::finish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| index | ||
| ) | ||
| } | ||
| DataType::Time64(TimeUnit::Microsecond) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible type is DataType::Time64(TimeUnit::Nanosecond)
There is also a Time32 type in arrow: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Time32
I am not sure if we want to handle that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only support the Time64(TimeUnit::Microsecond) here because that variant spec said that Time is TimeUnit::Microsecond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supernit: perhaps we can move this branch to just below Date32 so that the order aligns the spec.
| fn append_value(&mut self, v: ()); | ||
| } | ||
|
|
||
| impl AppendValueTrait for NullBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NullArray is kind of strange -- maybe rather than creating a full on builder for it, we could just keep count and then create the output array with https://docs.rs/arrow/latest/arrow/array/fn.new_null_array.html ?
| datatypes::Date32Type::from_naive_date | ||
| ); | ||
| impl_primitive_from_variant!(datatypes::Time64MicrosecondType, as_time_utc, |v| { | ||
| (v.num_seconds_from_midnight() * 1_000_000 + v.nanosecond() / 1_000) as i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it normal to take the floor instead of rounding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok because nanosecond() will return u32, and the result will be truncated to zero.
| generic_conversion_single_value!( | ||
| Decimal128Type, | ||
| as_primitive, | ||
| |v| VariantDecimal16::try_new(v, *s as u8).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure these try_new calls will never fail?
I mean, we are unshredding shredded data, but it's also... data... which is not trustworthy.
For safety, suggest to do something like:
| |v| VariantDecimal16::try_new(v, *s as u8).unwrap(), | |
| |v| VariantDecimal16::try_new(v, *s as u8).map_or(Variant::Null, Variant::from), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion, fixed.
Ah, found that we can pass in different types here(VariantDecimal16 vs Variant::Decimal16), it is cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it accepts any V: IntoVariant
| (v / 1_000_000) as u32, | ||
| (v % 1_000_000) as u32 * 1000 | ||
| ) | ||
| .unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above -- are we absolutely sure the conversion will never fail? Or do we need a Variant::Null fallback?
| fn append_value(&mut self, v: ()); | ||
| } | ||
|
|
||
| impl AppendValueTrait for NullBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now that I'm actually looking at the code (and not just reacting to the comments) --
It seems like we don't actually need a NullBuilder at all? We know the capacity, so just use NullArray::new:
struct FakeNullBuilder(NullArray);
impl FakeNullBuilder {
fn new(capacity: usize) -> Self {
Self(NullArray::new(capacity)
}
fn append_value(&mut self, _: ()) {}
fn append_null(&mut self) {}
fn finish(self) -> NullArray {
self.0
}
}... and then the macro invocation us just:
|capacity| -> FakeNullBuilder { FakeNullBuilder::new(capacity) },| define_variant_to_primitive_builder!( | ||
| struct VariantToNullArrowRowBuilder<'a> | ||
| |_capacity| -> NullBuilder { NullBuilder::new() }, | ||
| | v | v.as_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: odd spacing there?
| | v | v.as_null(), | |
| |v| v.as_null(), |
7658b17 to
78ebf7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb @scovich Thanks for the review! I've updated the code. Please take another look.
CI failed because
Using default tag: latest
Error response from daemon: Head "https://registry-1.docker.io/v2/amd64/rust/manifests/latest": received unexpected HTTP status: 503 Service Unavailable
Warning: Docker pull failed with exit code 1, back off 1.328 seconds before retry.
| datatypes::Date32Type::from_naive_date | ||
| ); | ||
| impl_primitive_from_variant!(datatypes::Time64MicrosecondType, as_time_utc, |v| { | ||
| (v.num_seconds_from_midnight() * 1_000_000 + v.nanosecond() / 1_000) as i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok because nanosecond() will return u32, and the result will be truncated to zero.
| generic_conversion_single_value!( | ||
| Decimal128Type, | ||
| as_primitive, | ||
| |v| VariantDecimal16::try_new(v, *s as u8).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion, fixed.
Ah, found that we can pass in different types here(VariantDecimal16 vs Variant::Decimal16), it is cool.
| index | ||
| ) | ||
| } | ||
| DataType::Time64(TimeUnit::Microsecond) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only support the Time64(TimeUnit::Microsecond) here because that variant spec said that Time is TimeUnit::Microsecond.
| fn append_value(&mut self, v: ()); | ||
| } | ||
|
|
||
| impl AppendValueTrait for NullBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullArray (with DataType::Null) is not the same as an array (of some arbitrary other type) full of nulls tho?
new_null_arraywith different types are different, the following unit test will fail
let first = new_null_array(&DataType::Int32, 3);
let second = new_null_array(&DataType::Timestamp(TimeUnit::Nanosecond, None), 3);
assert_eq!(*first, second.clone());
Use NullArray instead of new_null_array in the changed code because the compiler will panic with the trait bound Result<Arc<dyn Array>, ArrowError>: Array is not satisfied if I use new_null_array in FakeNullBuilder::finish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One nit and one follow-up item to potentially track.
| generic_conversion_single_value!( | ||
| Decimal128Type, | ||
| as_primitive, | ||
| |v| VariantDecimal16::try_new(v, *s as u8).map_or(Variant::Null, Variant::from), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need a follow-up item to track turning this into a proper Result instead of silently converting to NULL on failure? That would require VariantArray::value to return a Result<Variant>, which is a biggish change but honestly seems appropriate given that we can't guarantee the input shredding is even physically valid, let alone logically valid? It seems too sharp an edge to just panic.
@alamb -- thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue #8672 to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need a follow-up item to track turning this into a proper Result instead of silently converting to NULL on failure?
I think this makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need a follow-up item to track turning this into a proper Result instead of silently converting to NULL on failure? That would require VariantArray::value to return a Result, which is a biggish change but honestly seems appropriate given that we can't guarantee the input shredding is even physically valid, let alone logically valid? It seems too sharp an edge to just panic.
@alamb -- thoughts?
How about we add a try_value() method that returns a Result and just have value() unwrap the result?
| define_variant_to_primitive_builder!( | ||
| struct VariantToNullArrowRowBuilder<'a> | ||
| |capacity| -> FakeNullBuilder { FakeNullBuilder::new(capacity) }, | ||
| |value| value.as_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just hard-wire |_| Some(Variant::Null) here?
All input variant subtypes ultimately produce the same outcome because both append_null and append_value are no-ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more obvious: the result is always Variant::Null; updated the code.
As macro can't receive _ for ident (need change to pat and more), changed to _value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @klion26 @scovich and @friendlymatthew
| generic_conversion_single_value!( | ||
| Decimal128Type, | ||
| as_primitive, | ||
| |v| VariantDecimal16::try_new(v, *s as u8).map_or(Variant::Null, Variant::from), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need a follow-up item to track turning this into a proper Result instead of silently converting to NULL on failure? That would require VariantArray::value to return a Result, which is a biggish change but honestly seems appropriate given that we can't guarantee the input shredding is even physically valid, let alone logically valid? It seems too sharp an edge to just panic.
@alamb -- thoughts?
How about we add a try_value() method that returns a Result and just have value() unwrap the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| index | ||
| ) | ||
| } | ||
| DataType::Time64(TimeUnit::Microsecond) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supernit: perhaps we can move this branch to just below Date32 so that the order aligns the spec.
|
🚀 thanks @liamzwbao @friendlymatthew @klion26 and @scovich |
Which issue does this PR close?
What changes are included in this PR?
typed_value_to_variant/PrimitiveVariantToArrowRowBuilderforNull/Time/Decimal{4,8,16}PrimitiveFromVariantforTime64MicrosecondTypeAre these changes tested?
Added some tests
Are there any user-facing changes?
No