Fix conversion from DecisionTree
to TableFactor
#1929
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I recently discovered a bug in the way we are converting a
DecisionTree
(and its derivatives such asDecisionTreeFactor
) into aTableFactor
, where the leaf value ordering is incorrect.I have added a test that showcases this error, called
Conversion
.Issue Description
The issue stems from the fact that the DecisionTree is structured with the highest label first, so a DecisionTreeFactor with keys
(x0, x1, x2)
will have a leaf visiting scheme where x0 is the fast moving index and x2 is the slowest moving one.This results in a different ordering of the resulting vector of leaf values than what the TableFactor expects (where x0 is the slowest moving index and x2 is the fastest), and thus incorrect sparse table construction.
Currently I can't seem to figure out what the correct approach is to "invert" this sequencing. For example, for a tree with leaves
0.5 0.4 0.2 0.5 0.6 0.8
(from theTableFactor::constructors
test), the output ofDecisionTreeFactor::probabilities
is0.5, 0.5, 0.4, 0.6, 0.2, 0.8
which gives the ordering as[0, 4, 2, 6, 1, 5, 3, 7]
.I spent all day yesterday trying to figure out the mapping between the ordering but there is always some test case that fails. If there is a formula/algorithm for this conversion of orderings, please let me know so I can use that instead of the current approach.
Proposed Fix
To address the above issue in a general fashion, I updated
ComputeLeafOrdering
to visit the leaves of the tree and generate theSparseVector
index from the correspondingAssignment
of discrete keys. This allows us to directly construct the sparse table with the correct indexing and structure, circumventing 2 issues:DecisionTree::probabilities
constructs a dense vector which can cause exponential blowup in memory. The proposed approach in this fix prevents the need to ever construct this vector. Of course this means that performance is predicated on the fact that the DecisionTree is "sparse" to begin with.The major drawback though is that based on a benchmarking test I wrote, this scheme is slower than the current approach, but premature optimization is the root of all evil so we will cross that bridge when we come to it.