Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix conversion from DecisionTree to TableFactor #1929

Merged
merged 4 commits into from
Dec 13, 2024
Merged

Conversation

varunagrawal
Copy link
Collaborator

@varunagrawal varunagrawal commented Dec 13, 2024

I recently discovered a bug in the way we are converting a DecisionTree (and its derivatives such as DecisionTreeFactor) into a TableFactor, where the leaf value ordering is incorrect.
I have added a test that showcases this error, called Conversion.

Issue Description

The issue stems from the fact that the DecisionTree is structured with the highest label first, so a DecisionTreeFactor with keys (x0, x1, x2) will have a leaf visiting scheme where x0 is the fast moving index and x2 is the slowest moving one.
This results in a different ordering of the resulting vector of leaf values than what the TableFactor expects (where x0 is the slowest moving index and x2 is the fastest), and thus incorrect sparse table construction.

Currently I can't seem to figure out what the correct approach is to "invert" this sequencing. For example, for a tree with leaves 0.5 0.4 0.2 0.5 0.6 0.8 (from the TableFactor::constructors test), the output of DecisionTreeFactor::probabilities is 0.5, 0.5, 0.4, 0.6, 0.2, 0.8 which gives the ordering as [0, 4, 2, 6, 1, 5, 3, 7].

I spent all day yesterday trying to figure out the mapping between the ordering but there is always some test case that fails. If there is a formula/algorithm for this conversion of orderings, please let me know so I can use that instead of the current approach.

Proposed Fix

To address the above issue in a general fashion, I updated ComputeLeafOrdering to visit the leaves of the tree and generate the SparseVector index from the corresponding Assignment of discrete keys. This allows us to directly construct the sparse table with the correct indexing and structure, circumventing 2 issues:

  1. The need to perform the mapping from highest label first to lowest label first (or any other scheme) ordering.
  2. DecisionTree::probabilities constructs a dense vector which can cause exponential blowup in memory. The proposed approach in this fix prevents the need to ever construct this vector. Of course this means that performance is predicated on the fact that the DecisionTree is "sparse" to begin with.

The major drawback though is that based on a benchmarking test I wrote, this scheme is slower than the current approach, but premature optimization is the root of all evil so we will cross that bridge when we come to it.

@varunagrawal varunagrawal self-assigned this Dec 13, 2024
Copy link
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I’ll approve but would like to see the comments addressed. I know the style comment might rankle. Fee free to ignore IF this file is consistent in using underscores. But typically with new code I’d adhere to GTSAM.

gtsam/discrete/TableFactor.cpp Show resolved Hide resolved
gtsam/discrete/TableFactor.cpp Outdated Show resolved Hide resolved
gtsam/discrete/TableFactor.cpp Show resolved Hide resolved
gtsam/discrete/TableFactor.cpp Show resolved Hide resolved
@varunagrawal
Copy link
Collaborator Author

I'll update the style. My main goal was to get this out and get comments on whether there was a better way to handle the ordering.

Another approach I had was to reverse the order of keys to the TableFactor constructor but that causes all sorts of equality checks to fail, both for keys (since they are stored as a KeyVector, I am assuming the ordering matters mathematically though operationally I don't see why) and the SparseVector.

@dellaert
Copy link
Member

I'll update the style. My main goal was to get this out and get comments on whether there was a better way to handle the ordering.

Another approach I had was to reverse the order of keys to the TableFactor constructor but that causes all sorts of equality checks to fail, both for keys (since they are stored as a KeyVector, I am assuming the ordering matters mathematically though operationally I don't see why) and the SparseVector.

I think that’s probably safe. And your fix also prevents exponential blow-up :-)

@varunagrawal varunagrawal merged commit 3af5360 into develop Dec 13, 2024
33 checks passed
@varunagrawal varunagrawal deleted the table-factor-fix branch December 13, 2024 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants