This code implements a method to represent ensemble trees (XGBoost and RanomForest) with propsitional logic. We then use this representation to test the robustness of the models using SAT (with Z3 solver).
We aim to formulate a weighted Max-SAT problem which, given a point
Let
To craft our statement we need two types of weighted clauses:
-
$\phi_{ij}(b_i,b_j)$ , Clauses with weight 1 to check whether leaf nodes from two different trees$i$ and$j$ overlap:$\text{Box}(b_i) \cap \text{Box}(b_j)$ -
$\phi_i(b_i)$ , Clauses with weight$\text{Val}(b_i)$ to check whether a leaf node from a tree$i$ over laps with the ball:$\text{Box}(b_i) \cap \text{Ball}(x,\epsilon)$
We can then combine these two clauses to write out
and with the associated weights and run a weight MaxSAT over the
expression. The solver should return a configuration values for the
variables
The
To code the equivalent CNF model of the ensemble tree, we extract all
thresholds from the nodes of the tree. For each threshold we create one column in data which is True if the value of the
corresponding feature
Using the binary data, let
To check the adversarial robustness of the tree based model with SAT, the CNF of the ensemble contains the following components:
-
Declaring Boolean variables for each binary feature
$x_{f, th_{f,i}}$ defined for feature$f$ and threshold$i$ associated with that feature$th_{f,i}$ ; and each decision box$b_{p,i}$ . -
Define the boundaries of the decision boxes, e.g.,
$b_{1,1} \leftrightarrow (\neg x_{2, th_{2,1}} \wedge x_{5, th_{5,3}})$ means that the boundaries of$b_{1,1}$ are defined by$x_{2, th_{2, 1}}$ and$x_{5, th_{5,3}}$ . Since$x_{2, th_{2,1}}$ is negated, the values of this box should on the left branch (smaller than or equal to$th_{2,1}$ . And the other boundary contains the feature$x_5$ and should be on the right branch (larger than$th_{5,3}$ ). -
Each tree must have only one decision. Therefore, we force each tree to have exactly one True decision, e.g., for a tree with three decision boxes
$b_i, i \in {1,2,3}$ we have$(b_1 \vee b_2 \vee b_3) \wedge (b_1 \rightarrow \neg(b2 \vee b_3)) \wedge (b_2 \rightarrow \neg(b_1 \vee b_3)) \wedge (b_3 \rightarrow \neg(b_1 \vee b2))$ . -
The search should be limited to the boxes that overlap with
$\text{Ball}(x,\epsilon)$ . Therefore, all the decision boxes that do not have an overlap with the ball are negated. -
To choose the best combination of the leaves, we use the value of the leaves as the weights of the soft assertions; and only choose the leaves that have an overlap with
$\text{Ball}(x,\epsilon)$ . MaxSAT therefore finds the satisfying clauses with the highest score. -
Since we want the decision result to be changed, the current leaves should not be chosen. To enforce this constraint the decision boxes associated with the input sample are negated.
Assuming that the trees are of equal size, the number of path
combinations in the tree ensemble with
Check examples
folder for examples of using the code.