-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support simplify not for physical expr #18970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| panic!("Expected binary OR expression result"); | ||
| } | ||
|
|
||
| Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for NOT(NOT(a) AND NOT(b)).
It should be simplified to a OR b but let's confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done a0116ce
|
|
||
| fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> { | ||
| // Apply NOT expression simplification first | ||
| let not_simplified = simplify_not_expr_recursive(&node, self.schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the variable is a bit confusing - not_simplified. It sounds like it is not simplified.
Maybe rename it to not_expr_simplified ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, agree
| return Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(!val))))); | ||
| } | ||
| if let ScalarValue::Boolean(None) = literal.value() { | ||
| return Ok(Transformed::yes(lit(ScalarValue::Boolean(None)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return Transformed::yes ?
It returns the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is NOT(NULL) -> NULL. All is good!
| if not_simplified.transformed { | ||
| // Recursively simplify the result | ||
| let further_simplified = | ||
| simplify_not_expr_recursive(¬_simplified.data, schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this could be a vector of an attack.
Someone may craft a query with many not(not(not(not(...)))) to cause a stack overflow due to the usage of recursion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the recursion to iteration, and added a test for deep not, I verified the test under the old implemtaion, it will hang, but with the new way, it's okay. ceb85ff
|
Thanks for the review @martin-g |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xudong963 and @martin-g
I think prior to releasing this code we should make the following changes:
- Avoid explicit recursion in not simplification (and instead use the tree node API)
- Avoid the potential infinite loop
It would be nice to make them as part of this PR, but I also think it would be ok to do it as a follow on
Other things I think would be good to do but not strictly necessary:
- Move the tests to the public interface (PhysicalExprSimplifier)
- Avoid duplications of
Operator::negate
| ); | ||
| Ok(unwrapped) | ||
| // Combine transformation results | ||
| let final_transformed = transformed || unwrapped.transformed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use transform_data here instead: https://docs.rs/datafusion/latest/datafusion/common/tree_node/struct.Transformed.html#method.transform_data
So something like
// Apply NOT expression simplification first
let rewritten =
simplify_not_expr(&node, self.schema)?.transform_data(|node| {
unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
})?;That handles combining the transformed flag for you
|
|
||
| // Apply unwrap cast optimization | ||
| #[cfg(test)] | ||
| let original_type = node.data_type(self.schema).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to move the original type check before applying the not and the verification after the not
Somethg like
impl<'a> TreeNodeRewriter for PhysicalExprSimplifier<'a> {
type Node = Arc<dyn PhysicalExpr>;
fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> {
#[cfg(test)]
let original_type = node.data_type(self.schema).unwrap();
// Apply NOT expression simplification first
let rewritten =
simplify_not_expr(&node, self.schema)?.transform_data(|node| {
unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
})?;
#[cfg(test)]
assert_eq!(
rewritten.data.data_type(self.schema).unwrap(),
original_type,
"Simplified expression should have the same data type as the original"
);
Ok(rewritten)
}
}| ]) | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is not really run in isolation (it is always run as part of PhysicalExprSimplifier) I think it would be better if these tests were in datafusion/physical-expr/src/simplifier/mod.rs rather than here.
I don't think this change is required
| } | ||
|
|
||
| /// Returns the negated version of a comparison operator, if possible | ||
| fn negate_operator(op: &Operator) -> Option<Operator> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks the same as Operator::negate: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.negate
I recommend using that code, or if there is a reason not to use Operator::negate add a comment explaining why it is not used
| Arc::new(NotExpr::new(Arc::clone(binary_expr.right()))); | ||
|
|
||
| // Recursively simplify the NOT expressions | ||
| let simplified_left = simplify_not_expr(¬_left, schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since simplify_not_expr is used in PhysicalExprSimplifier which is doing a walk up the plan (using f_up) there is no reason to also recurse explicitly in this rule.
You should be able to apply the rewrite rules only NOT(A OR B) --> NOT A AND NOT B rather than also changing the exprs
| let mut current_expr = Arc::clone(expr); | ||
| let mut overall_transformed = false; | ||
|
|
||
| loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also somewhat concerned about this loop -- it seems like it may be trying to handle a non complete recursion and it could potentially lead to infinite recursion if a rewrite flip/flopped
If the loop is needed I suggest:
- Put it in the higher level PhysicalExprSimplifier
- add a counter that breaks after some number of iterations (e.g. 5 or something)
|
|
||
| assert!(result.transformed); | ||
| // Should be simplified back to the original b > 5 | ||
| assert_eq!(result.data.to_string(), inner_expr.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for using to_string()?
I tried comparing the two directly and it seems to work
assert_eq!(&result.data, &inner_expr);| assert!(result.transformed); | ||
| // Should be simplified back to the original b > 5 | ||
| assert_eq!(result.data.to_string(), inner_expr.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also make these queries easier to read if you made a helper like
fn assert_transformed(expr, expected_expr);and
fn assert_not_transformed(expr, expected_expr);| if let Some(literal) = result.data.as_any().downcast_ref::<Literal>() { | ||
| assert_eq!(literal.value(), &ScalarValue::Boolean(Some(false))); | ||
| } else { | ||
| panic!("Expected literal result"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could potentially make this a function to make the tests easier to read - something like
let literal = as_literal(&result);And handle the panic internally in as_literal
That way the mechics of the test wouldn't obscure the test logic so much
I think you could do something similar with BinaryExpr
| let schema = test_schema(); | ||
|
|
||
| // Create a deeply nested NOT expression: NOT(NOT(NOT(...NOT(b > 5)...))) | ||
| // This tests that we don't get stack overflow with many nested NOTs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Part of the #18868