Skip to content

Conversation

@xudong963
Copy link
Member

Part of the #18868

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates execution Related to the execution crate labels Nov 27, 2025
panic!("Expected binary OR expression result");
}

Ok(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for NOT(NOT(a) AND NOT(b)).
It should be simplified to a OR b but let's confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done a0116ce


fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> {
// Apply NOT expression simplification first
let not_simplified = simplify_not_expr_recursive(&node, self.schema)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the variable is a bit confusing - not_simplified. It sounds like it is not simplified.
Maybe rename it to not_expr_simplified ?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, agree

return Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(!val)))));
}
if let ScalarValue::Boolean(None) = literal.value() {
return Ok(Transformed::yes(lit(ScalarValue::Boolean(None))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return Transformed::yes ?
It returns the same value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is NOT(NULL) -> NULL. All is good!

if not_simplified.transformed {
// Recursively simplify the result
let further_simplified =
simplify_not_expr_recursive(&not_simplified.data, schema)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this could be a vector of an attack.
Someone may craft a query with many not(not(not(not(...)))) to cause a stack overflow due to the usage of recursion here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the recursion to iteration, and added a test for deep not, I verified the test under the old implemtaion, it will hang, but with the new way, it's okay. ceb85ff

@github-actions github-actions bot removed the execution Related to the execution crate label Dec 1, 2025
@xudong963 xudong963 requested a review from zhuqi-lucas December 1, 2025 07:31
@xudong963
Copy link
Member Author

Thanks for the review @martin-g

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xudong963 and @martin-g

I think prior to releasing this code we should make the following changes:

  1. Avoid explicit recursion in not simplification (and instead use the tree node API)
  2. Avoid the potential infinite loop

It would be nice to make them as part of this PR, but I also think it would be ok to do it as a follow on

Other things I think would be good to do but not strictly necessary:

  1. Move the tests to the public interface (PhysicalExprSimplifier)
  2. Avoid duplications of Operator::negate

);
Ok(unwrapped)
// Combine transformation results
let final_transformed = transformed || unwrapped.transformed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use transform_data here instead: https://docs.rs/datafusion/latest/datafusion/common/tree_node/struct.Transformed.html#method.transform_data

So something like

        // Apply NOT expression simplification first
        let rewritten =
            simplify_not_expr(&node, self.schema)?.transform_data(|node| {
                unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
            })?;

That handles combining the transformed flag for you


// Apply unwrap cast optimization
#[cfg(test)]
let original_type = node.data_type(self.schema).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to move the original type check before applying the not and the verification after the not

Somethg like

impl<'a> TreeNodeRewriter for PhysicalExprSimplifier<'a> {
    type Node = Arc<dyn PhysicalExpr>;

    fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> {
        #[cfg(test)]
        let original_type = node.data_type(self.schema).unwrap();

        // Apply NOT expression simplification first
        let rewritten =
            simplify_not_expr(&node, self.schema)?.transform_data(|node| {
                unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
            })?;

        #[cfg(test)]
        assert_eq!(
            rewritten.data.data_type(self.schema).unwrap(),
            original_type,
            "Simplified expression should have the same data type as the original"
        );

        Ok(rewritten)
    }
}

])
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function is not really run in isolation (it is always run as part of PhysicalExprSimplifier) I think it would be better if these tests were in datafusion/physical-expr/src/simplifier/mod.rs rather than here.

I don't think this change is required

}

/// Returns the negated version of a comparison operator, if possible
fn negate_operator(op: &Operator) -> Option<Operator> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks the same as Operator::negate: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.negate

I recommend using that code, or if there is a reason not to use Operator::negate add a comment explaining why it is not used

Arc::new(NotExpr::new(Arc::clone(binary_expr.right())));

// Recursively simplify the NOT expressions
let simplified_left = simplify_not_expr(&not_left, schema)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since simplify_not_expr is used in PhysicalExprSimplifier which is doing a walk up the plan (using f_up) there is no reason to also recurse explicitly in this rule.

You should be able to apply the rewrite rules only NOT(A OR B) --> NOT A AND NOT B rather than also changing the exprs

let mut current_expr = Arc::clone(expr);
let mut overall_transformed = false;

loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also somewhat concerned about this loop -- it seems like it may be trying to handle a non complete recursion and it could potentially lead to infinite recursion if a rewrite flip/flopped

If the loop is needed I suggest:

  1. Put it in the higher level PhysicalExprSimplifier
  2. add a counter that breaks after some number of iterations (e.g. 5 or something)


assert!(result.transformed);
// Should be simplified back to the original b > 5
assert_eq!(result.data.to_string(), inner_expr.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using to_string()?

I tried comparing the two directly and it seems to work

        assert_eq!(&result.data, &inner_expr);

Comment on lines +225 to +227
assert!(result.transformed);
// Should be simplified back to the original b > 5
assert_eq!(result.data.to_string(), inner_expr.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also make these queries easier to read if you made a helper like

fn assert_transformed(expr, expected_expr);

and

fn assert_not_transformed(expr, expected_expr);

Comment on lines +240 to +244
if let Some(literal) = result.data.as_any().downcast_ref::<Literal>() {
assert_eq!(literal.value(), &ScalarValue::Boolean(Some(false)));
} else {
panic!("Expected literal result");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could potentially make this a function to make the tests easier to read - something like

let literal = as_literal(&result);

And handle the panic internally in as_literal

That way the mechics of the test wouldn't obscure the test logic so much

I think you could do something similar with BinaryExpr

let schema = test_schema();

// Create a deeply nested NOT expression: NOT(NOT(NOT(...NOT(b > 5)...)))
// This tests that we don't get stack overflow with many nested NOTs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants