Fix conversion from `DecisionTree` to `TableFactor` #1929

varunagrawal · 2024-12-13T14:46:13Z

I recently discovered a bug in the way we are converting a DecisionTree (and its derivatives such as DecisionTreeFactor) into a TableFactor, where the leaf value ordering is incorrect.
I have added a test that showcases this error, called Conversion.

Issue Description

The issue stems from the fact that the DecisionTree is structured with the highest label first, so a DecisionTreeFactor with keys (x0, x1, x2) will have a leaf visiting scheme where x0 is the fast moving index and x2 is the slowest moving one.
This results in a different ordering of the resulting vector of leaf values than what the TableFactor expects (where x0 is the slowest moving index and x2 is the fastest), and thus incorrect sparse table construction.

Currently I can't seem to figure out what the correct approach is to "invert" this sequencing. For example, for a tree with leaves 0.5 0.4 0.2 0.5 0.6 0.8 (from the TableFactor::constructors test), the output of DecisionTreeFactor::probabilities is 0.5, 0.5, 0.4, 0.6, 0.2, 0.8 which gives the ordering as [0, 4, 2, 6, 1, 5, 3, 7].

I spent all day yesterday trying to figure out the mapping between the ordering but there is always some test case that fails. If there is a formula/algorithm for this conversion of orderings, please let me know so I can use that instead of the current approach.

Proposed Fix

To address the above issue in a general fashion, I updated ComputeLeafOrdering to visit the leaves of the tree and generate the SparseVector index from the corresponding Assignment of discrete keys. This allows us to directly construct the sparse table with the correct indexing and structure, circumventing 2 issues:

The need to perform the mapping from highest label first to lowest label first (or any other scheme) ordering.
DecisionTree::probabilities constructs a dense vector which can cause exponential blowup in memory. The proposed approach in this fix prevents the need to ever construct this vector. Of course this means that performance is predicated on the fact that the DecisionTree is "sparse" to begin with.

The major drawback though is that based on a benchmarking test I wrote, this scheme is slower than the current approach, but premature optimization is the root of all evil so we will cross that bridge when we come to it.

dellaert

Cool. I’ll approve but would like to see the comments addressed. I know the style comment might rankle. Fee free to ignore IF this file is consistent in using underscores. But typically with new code I’d adhere to GTSAM.

gtsam/discrete/TableFactor.cpp

varunagrawal · 2024-12-13T17:17:33Z

I'll update the style. My main goal was to get this out and get comments on whether there was a better way to handle the ordering.

Another approach I had was to reverse the order of keys to the TableFactor constructor but that causes all sorts of equality checks to fail, both for keys (since they are stored as a KeyVector, I am assuming the ordering matters mathematically though operationally I don't see why) and the SparseVector.

dellaert · 2024-12-13T19:14:36Z

I'll update the style. My main goal was to get this out and get comments on whether there was a better way to handle the ordering.

Another approach I had was to reverse the order of keys to the TableFactor constructor but that causes all sorts of equality checks to fail, both for keys (since they are stored as a KeyVector, I am assuming the ordering matters mathematically though operationally I don't see why) and the SparseVector.

I think that’s probably safe. And your fix also prevents exponential blow-up :-)

varunagrawal added 2 commits December 13, 2024 08:07

test exposing incorrect conversion

b91c470

update ComputeLeafOrdering to give a correct vector of values

a8e24ef

varunagrawal requested a review from dellaert December 13, 2024 14:46

varunagrawal self-assigned this Dec 13, 2024

clean up test

7d389a5

dellaert approved these changes Dec 13, 2024

View reviewed changes

gtsam/discrete/TableFactor.cpp Show resolved Hide resolved

gtsam/discrete/TableFactor.cpp Outdated Show resolved Hide resolved

gtsam/discrete/TableFactor.cpp Show resolved Hide resolved

gtsam/discrete/TableFactor.cpp Show resolved Hide resolved

address review comments

9830981

varunagrawal merged commit 3af5360 into develop Dec 13, 2024
33 checks passed

varunagrawal deleted the table-factor-fix branch December 13, 2024 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix conversion from `DecisionTree` to `TableFactor` #1929

Fix conversion from `DecisionTree` to `TableFactor` #1929

varunagrawal commented Dec 13, 2024 •

edited

Loading

dellaert left a comment

varunagrawal commented Dec 13, 2024

dellaert commented Dec 13, 2024

Fix conversion from DecisionTree to TableFactor #1929

Fix conversion from DecisionTree to TableFactor #1929

Conversation

varunagrawal commented Dec 13, 2024 • edited Loading

Issue Description

Proposed Fix

dellaert left a comment

Choose a reason for hiding this comment

varunagrawal commented Dec 13, 2024

dellaert commented Dec 13, 2024

Fix conversion from `DecisionTree` to `TableFactor` #1929

Fix conversion from `DecisionTree` to `TableFactor` #1929

varunagrawal commented Dec 13, 2024 •

edited

Loading