Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASK Ignores Background Knowledge Constraints in Sachs Data #38

Open
chrisquatjr opened this issue Dec 10, 2024 · 1 comment
Open

FASK Ignores Background Knowledge Constraints in Sachs Data #38

chrisquatjr opened this issue Dec 10, 2024 · 1 comment

Comments

@chrisquatjr
Copy link

First of all, thank you to the py-tetrad team for such active maintenance of a great set of resources here!

Problem Description

I have been trying to reproduce the FASK paper results (found here: arxiv.org/pdf/1805.03108) using the Sachs dataset (https://github.com/cmu-phil/example-causal-datasets/blob/main/real/sachs/data/sachs.2005.logxplus10.jittered.eperimental.continuous.txt). The algorithm (or perhaps the TetradSearch object itself) appears to ignore background knowledge constraints. Specifically, edges appear between intervention variables even when explicitly forbidden.

Example

# Load and process Sachs dataset
df = pd.read_csv('sachs.2005.with.jittered.experimental.continuous.txt', sep='\t')
log_df = df.apply(lambda x: np.log2(x + 10))

# Setup FASK with background knowledge
fask_search = ts.TetradSearch(log_df)

# Add variables to tiers and forbid intervention-intervention edges
for var in int_cols:
    fask_search.add_to_tier(0, var)
for var in measured_cols:
    fask_search.add_to_tier(1, var)
for int1 in int_cols:
    for int2 in int_cols:
        if int1 != int2:
            fask_search.set_forbidden(int1, int2)

# Run FASK
fask_search.use_sem_bic()
fask_search.run_fask(alpha=0.00001, depth=-1, fask_delta=-0.2,
                     left_right_rule=1, skew_edge_threshold=0.3)

Despite fask_search.print_knowledge() showing forbidden edges (e.g., "b2camp cd3_cd28"), these edges still appear in the output (e.g., "b2camp --> cd3_cd28").
This significantly impacts performance:

Published results: AP=0.84, AR=0.80, AHP=1.00, AHR=0.79
My results: AP=0.127, AR=0.438, AHP=0.109, AHR=0.412

Environment: Python 3.11.8, py-tetrad 0.1.2, Ubuntu 22.04.5 LTS

Perhaps I have missed something essential here. In my testing so far, I have been unsuccessful in encoding exogenous background data into my analysis. Any assistance on this would be greatly appreciated! Thank you for your time and consideration.

@jdramsey
Copy link
Collaborator

Nope, you did no miss anything; the error is mine! In the TetradSearch.py class, knowledge was not being passed to FASK. I just added this line:

        alg.setKnowledge(self.knowledge)

to this method:

    def run_fask(self, alpha=0.05, depth=-1, fask_delta=-0.3, left_right_rule=1, skew_edge_threshold=0.3):
        self.params.set(Params.ALPHA, alpha)
        self.params.set(Params.DEPTH, depth)
        self.params.set(Params.FASK_DELTA, fask_delta)
        self.params.set(Params.FASK_LEFT_RIGHT_RULE, left_right_rule)
        self.params.set(Params.SKEW_EDGE_THRESHOLD, skew_edge_threshold)

        alg = dag.Fask(self.SCORE)
        alg.setKnowledge(self.knowledge)
        self.java = alg.search(self.data, self.params)
        self.bootstrap_graphs = alg.getBootstrapGraphs()

If you do a git pull for the py-tetrad repository (or check it out again) or re-apply the pip install, you should get the change. (Or you could just make the change yourself in the file.)

Best,

Joe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants