-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating Bayesian power analysis method #292
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #292 +/- ##
==========================================
- Coverage 77.10% 73.34% -3.76%
==========================================
Files 21 21
Lines 1380 1493 +113
==========================================
+ Hits 1064 1095 +31
- Misses 316 398 +82 ☔ View full report in Codecov by Sentry. |
Hey @drbenvincent According to what we discussed in the previous PR, I have made the following changes:
The power analysis method should be universal to the different methods we use, the idea itself is to validate that our regression (Regardless of the type) does not capture any type of effect during a period where no effect is present, and using the posterior we can estimate then how large would the effect have to be. On the other hand, using this we can check the natural bias of the model in the estimates, checking how big reality diverges from the area of higher density. |
Cool. Sorry for the slow reply, only just getting a chance to look at this now. We lost a few changes I made with this new PR. So for example Will try to do a more thorough review today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the call today. Here are some thoughts:
Because this could be applied to multiple situations, not just synthetic control, then I think it could be a good idea to have a new section of the docs. It would look something like this
- Bayesian Power Analysis
- Explanation: This would be a new notebook which is mostly just text and explanation of the core logic of what happens in Bayesian Power Analysis. If there is a schematic image, then that would be great, otherwise I could come up with one in a later PR after this is merged. This could also include a couple of references for people who want to find out more. It could also be good to have a brief explanation of how this is similar or different to frequentist power analysis.
- Synthetic control example: This would be the current notebook you've put together. Could add a bit more explanation of what the mu and cumulative mu values correspond to
- Other examples: At a later date, you, me, or some other contributor could put together examples using power analysis with other quasi experiment types.
Could potentially add in a warning that the results may be biased when using the default flag of correction=False
. I'm happy for that to be the default still.
If we think this is actually very general, and could be applied to any of the quasi-experimental methods, then we'd want to remove the core methods out of the PrePostFit
class. It might take a bit of thinking about how best to implement it. For example, if we are lucky then we could just create a single BayesianPowerAnalysis
class which gets fed data and the experiment object. But it could be that there are experiment-specific nuances, in which case we'd want to just have methods available under the specific experiment classes. Happy to chat about this on a call - also happy to just get it merged and add a warning that this is experimental and that the API may change in the future.
Let me know, if you think something else is missing. Once is merged, I'll dedicate some hours to create/add other notebooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run make uml
to rebuild the UML diagrams, see instructions here https://github.com/pymc-labs/CausalPy/blob/main/CONTRIBUTING.md#overview-of-code-structure
I think the following line might be too strong - at the moment I think it's clear that we can apply to quasi-experimental designs with a pre and a post intervention period (i.e. synthetic controls and interrupted time series). It's not immediately obvious if it can be applied to other methods like ANCOVA or diff in diff or regression discontinuity. So we could maybe just say that this notebook will illustrate it's use in the context of synthetic control.
Our proposed power analysis method is designed to be universally applicable across different regression models in
causalpy
Can you change the notebook title to "Bayesian Power Analysis for Synthetic Control"?
In the notebook can you be more explicit what you mean by "this" where you write "we should see this difference approach zero"
Where you say: "It’s evident that the model fails to accurately represent reality. Even during periods without direct actions affecting the target, the actual values significantly deviate from the model’s predicted distribution."... can you just expand on this a bit. What is it about the image that shows this? Are we always in a position where we know the real mean, and is this only useful when we do?
Where you say: "In this scenario, the likelihood of our observed value falling within the model’s posterior distribution is extremely low"... can you say what about the summary table tells you this?
The Bayesian Tail Probability seems like a really important thing when interpreting the summary table. But at the moment there's no explanation about what this means. Could you add an explanation either in the introduction or when Bayesian Tail Probability first comes up?
Requested by @drbenvincent Co-Authored-By: Carlos Trujillo <59846724+cetagostini@users.noreply.github.com>
Hey @drbenvincent I'm going over your comments:
Done!
Done!
Done
Sure, now looks like: we should see the difference between the real data and the estimated inference approach zero, indicating that we have accurately captured the with our method the pre-intervention scenario.
Done, now looks like: :::{note}
It is important to remember that in this methodology, the **power analysis must be executed in a period of the same duration before the intervention** and we have the knowledge that no other action is affecting our target variable (`mu`).
If we do not have pre-intervention data, we cannot make this comparison of the model performance prior to the intervention, losing insight into its power.
Additionally, if during the period prior to the intervention our assumption about other activities affecting the target variable is not maintained, we will not be able to trust the results.
:::
Done, now I added the following: If a value is precisely at the mean, it has a probability of 1 to fall in our posterior. As the value moves away from the average towards both extremes of the distribution, the probability decreases and approaches zero. This process allows us to determine how 'typical' or 'atypical' a new observation is, based on our model estimated posterior. It is an effective tool for interpreting and comprehending the model's inference. |
Co-Authored-By: Carlos Trujillo <59846724+cetagostini@users.noreply.github.com>
Co-Authored-By: Carlos Trujillo <59846724+cetagostini@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for changes so far. I think it would be good to get some other's thoughts on this, so I've requested that @NathanielF take a look too. But here are some thoughts, mostly about clarity...
- change
treatment_time
toactual_treatment_time
throughout - change
test_time
tofake_treatment_time
throughout
Under "Evaluating the model" section, I find the wording confusing.
- Can you clarify what model intervals we are talking about? The HDI intervals of the causal impact?
- When you talk about the errors, clarify again that these are between the model retrodiction and the hold-out data. And these residuals are shown in the causal impact plot, to the right of the red line? Or also the cumulative being credibly non-zero?
- "we can see that the model based on these regressors is systematically failing" Explain why. What is it about the plot that shows us this?
Under the first "Power Analysis" section
- Would be a good point to introduce what the
alpha
kwarg is here. Eg are we talking about the same alpha as here https://en.wikipedia.org/wiki/Power_of_a_test ? Could be good to link to that if so, but to again clarify if the interpretation is the same of different from the frequentist setting.
Under "Evaluating the model"
- I'm still a bit confused about the logic of what's happening in
_power_estimation
. The mean operates on the outcome variable itself, so what seems to be going on is you're calculating the mean outcome in the hold-out period, like what's shown in the image below (see the magenta bar I added). You could easily have a situation where the model predictions are completely off from the data, but they have the same mean values. For example if the predictions were like a sine wave but the data were like a cosine wave for example. I don't know if this is a major problem or not, but I think this is an area of the approach where there is some uncertainty about why it's done this way. Would it make more sense to instead calculate a different statistic that is more sensitive to this, like the sum squared residuals (i.e. operate on the residuals / causal impact) rather than on the raw data space? Maybe this is good idea, maybe this is a silly idea. Either way, I think there needs to be some more explanation and justification of why this is working like it is. Has this kind of approach been done before?
Unfortunately, time wise I have to leave the review at this point. Looking forward to coming back to it, but feel free to respond to current points in the mean time if you have time.
If you also update from main
it should also help remove some of the warnings that you're getting in the notebook as we've merged some bug fixes recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conscious that' i'm late to the party in reviewing this PR, but my broad feeling is that we need a good bit more clarity for what we are doing in this setting, for two reasons (i) the docs should be ideally a sufficient resource for people want to dig into the methods and (ii) we need to work very hard to avoid conflation with "classical" power analysis and the terminological jargon there.
For instance:
We have introduced a power analysis method in a Bayesian framework that provides a systematic approach to determine the sensitivity of a model in detecting expected effects.
This sounds good, but is too vague. Sensitivity to what? For what?
The method involves creating a null model that does not capture any effect during a period where no effect is present.
Reference to the null model here has to be more explicit in the docstrings and notebook as applied to particular experiment designs i.e. the Pre-Post design.
This validation process not only aids in determining the required effect size for significance
Is this how we're defining power? Mixes effect size language and signifcance terminology. Confusing. If we have a Null model are we assessing an alternative mode? Is the goal to reject the Null or to determine superiority after treatment? Bayesian power analysis is well-defined in the literature - see e.g. Spiegelhalter _Bayesian Approaches to Clinical Trials and Health Care evaluation. How does this methodology relate to that one which aims to gauge power
In contrast, Bayesian approaches use the posterior distribution to estimate the probability of various outcomes
This is the heart of it i think. We're doing too much. Better to focus on one simple method of evaluating power using the posterior. Dig in deeply on this one example and then gesture towards other metrics we can calculate using Bayesian posterior based power analysis. Ideally state the goal here: In this notebook we intend to evaluate the sensitivity of the Bayesian pre-post model to interventions.... we will show that XYZ for a particular metric M. This will inform us about the power of our experiment design to precisely estimate conclusions regarding the treatment effect.
Maybe even constrast the richness of this perspective when looking at a more narrow frequentist phrasing of the power analysis.
Lost the whole getting started section from the Notebook.
Essentially, if the model accurately represents our reality, and we'll base our decision on the model intervals. By grasping these intervals, we can ascertain how significant an effect must be
... must. be for what?
Furthermore, we can view the quantiles and determine the minimum detectable effect (MDE)
I think you need to be clearer on what MDE means in this context. MDE relative to what hypothesis?
Something i don't understand. About the first plot, you're using filtered data i.e. before the real intervention, so you're just assessing out-of-sample model fit for 10 observations preceding the treatment? I think the fit looks pretty good on the outcome scale? Not sure why we should think it's systematically failing.
The table output is unclear to me. Not sure how it relates to power analysis exactly. I think you're doing something more like a coverage test for a prediction interval?
Kruschke has a good paper on Bayesian power analysis here: https://link.springer.com/article/10.3758/s13423-016-1221-4
I think we need to be clearer about what kind of analysis we're doing here and be more explicit in the doc-strings and the notebook.
|
||
def _power_estimation(self, alpha: float = 0.05, correction: bool = False) -> Dict: | ||
""" | ||
Estimate the statistical power of an intervention based on cumulative and mean results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is defined as sub-function under the Pre-Post design class which (i understand) is intended to apply to multiple experiment designs which avail of this structure. As such the doc-string should be very clear on the nature of the power calculation being done.
As it stands the doc-string tries to cover too much and does so too quickly mixing technical concepts like confidence intervals and MDE which are more naturally stated in the frequentist paradigm. Additionally the function itself is overloaded which makes it hard to parse what is going on.
Not saying you should introduce the material here for giving background on Bayesian power analysis, but the doc-string should have something about the structure of how the pre-post design informs the kind of power analysis we are conducting here? I.e. where aiming to gauge mean changes in the posterior over time....
It can apply corrections to account for systematic differences in the data.
This is too vague. What corrections?
It's not even really clear what we mean by "statistical power" in this context.
"cumulative": (cumulative_upper_mde + cumulative_lower_mde) / 2, | ||
"mean": (mean_upper_mde + mean_lower_mde) / 2, | ||
} | ||
return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider splitting these steps into smaller functions with individual docstrings desribing the metric you're calculating in each case.
correction (bool, optional): If True, applies corrections to account for systematic differences in
cumulative and mean calculations. Default is False.
This really needs clarifying with respect to the nature the particular type of power analysis you're running on the pre-post design structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but I think better describe on the document than the docstring otherwise can be to much.
""" | ||
Calculate and summarize the intervention analysis results in a DataFrame format. | ||
|
||
This function performs cumulative and mean calculations on the posterior predictive distributions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again the doc-string is overloaded and unclear. Not clear why the Pre-Post design is relevant. Is this a summary of the intervention or just the intervention with respect to the power analysis we're running?
But now we also have reference to Bayesian tail probabilities, posterior estimations and causal effects. Too much going on.
If we're using particular bayesian tail probabilities because we have assumed a particular likelihood distribution we should clarify this here.
|
||
Returns | ||
------- | ||
- pd.DataFrame: A DataFrame representing the power estimation results, including posterior estimations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again too vague since we're not clarifying before hand what the goal of Bayesian power analysis is in this Pre-Post design context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I believe the description of the power can be reference to the document but not the added directly. What do you think?
Hi @cetagostini Sorry i wrote a load of comments on the classes and methods before getting through the notebook fully. I think i realised at the end of the notebook what you're doing is not either "classical" power analysis or bayesian power analysis. But a kind of model adequacy check for out-of-sample prediction coverage? I think this is useful as a means of evaluating models, but the reference to power analysis confused me throughout. I've added some references to Bayesian power analysis - but i think the heart of it comes down to a statement about the sensitivity of a Bayesian model to either priors or sample data in the precision of estimating a quantity using the posterior. https://solomonkurz.netlify.app/blog/bayesian-power-analysis-part-i/ |
Hey team, thanks a lot for all the feedback 👍 @drbenvincent I'll go over your feedback, probably next week 🤞🏻 Looks pretty straightforward. @NathanielF I just did a quick check and read the article here and I feel we are pretty align. I agree we have a mix of concepts and ramifications about how to use it which can create confusion, we can jump on a call next week, let me know if you have time in the agenda! If I understand correctly the proposal from John K. Kruschke & Torrin M. Liddell, they establish a model in which through the result of their posterior a region which they consider significant or insignificant is defined.
Our objective is to establish a model utilizing a methodology, such as synthetic controls, to ascertain the feasible values of our target variable. Then, we define a ROPE (In this case we use the alpha parameter for that) and based on the real value of our target variable, we evaluate whether to consider it significant or not.
Following the statement from the paper, we could recreate that scenario: From my perspective, the approach appears to be performing identical actions. However, instead to execute it with a particular parameter, we use the entire target variable ( Now, I do this process for the average and cumulative values, because it is my way of working inherited from causal impact (We can probably do it in a better way), but I think it is the way most marketers who are used to those tools see it. Therefore, the table of results that I generate is basically a mirror of the one generated by Google Causal Impact. ## Posterior inference {CausalImpact}
##
## Average Cumulative
## Actual 117 3511
## Prediction (s.d.) 107 (0.37) 3196 (11.03)
## 95% CI [106, 107] [3174, 3217]
##
## Absolute effect (s.d.) 11 (0.37) 316 (11.03)
## 95% CI [9.8, 11] [294.9, 337]
##
## Relative effect (s.d.) 9.9% (0.35%) 9.9% (0.35%)
## 95% CI [9.2%, 11%] [9.2%, 11%]
##
## Posterior tail-area probability p: 0.001
## Posterior prob. of a causal effect: 99.9% Now, this is the main and only objective. The primary aim of your modeling exercise is to gauge the power of your models based on a minimum detectable effect (MDE). Once you have evaluated the MDE of your models, you can focus on selecting the models and their regressors to increase their power and identify any systematic biases. However, these activities are just secondary goals and not the main focus of your analysis. Ultimately, the key goal is to determine the MDE of your regressive model. Let me know what you think, I'd be happy to make time during the week to talk, I think it's easier to deal with doubts that way 💪🏻 |
Final comment @NathanielF, as I see it, we are perhaps aligned in the method (I will double-read your resources to be extremely sure) but perhaps we should align better in the definitions and words used in the notebook! @drbenvincent Any extra comments would be great to know, happy to discuss the idea and see if there is value, I particularly think we could use this addition to empower users to synergize between pymc-marketing and causalpy. Not to mention that it is a feature that until now I have not found in the most used models by Google and the only one now that has it is Meta. Possibly within a week or two, I might be ready again 😅 |
Thanks @cetagostini , yeah I think we need to be quite clear about the language. Happy to chat soon. Had a quick read of the If our focus is on This is totally a fine procedure to implement, but it is just a hypothesis test, not a power analysis by my understanding. A power test about movements in But yeah, happy to jump on a call when you have time. I just think we need to be real careful with language because hypothesis testing language is already such a mess because frequentist concepts in the mix. |
@drbenvincent This could happen, but for the model to make a prediction of that type and have this case, the user would then have to choose regressors that follow that pattern. Which would be very strange, because they should not correlate with the target variable, so why would I use them? I think it is a fairly isolated case. That in reality has never presented itself to me, |
@drbenvincent I'm not sure if I've seen this approach anywhere else. The reason for working in raw space than causal space is because it is easier for me to explain 😅 but we can add an option for the user to decide how they want to make it. What do you think? Plus, Causal impact from talks usually in raw space in the output, and I get use to it (You can see the previous output). |
@NathanielF Exactly this is what we can determine with the current process. After running the first model and seeing how it responds during the period prior to the intervention, where none effect should we present, we can find:
Iterating will allow us to respond those questions. Finding the best answers to decrease the MDE in our model. |
As we discussed, I added descriptions in the document example where I try to reform the idea as a sensitivity analysis that can become a power analysis if it is performed before the intervention and is used to determine what actions should be taken in order to increase the power of the method. Even so, I try to organize this new idea because, as I always imagine it as a previous step to the intervention, and by consequence, it was always a power analysis! |
I made some adjustments, and responded to some of your comments, let me know if we have anything extra to add! |
Feedback as I read through the notebook
Typo method
In this section: "Similarities and Differences to Frequentist Methods" would it be worth highlighting the relation of this idea to the one you showed me in the other causal impact package? In the plot can we change "real mean" to actual post-intervention mean. Or something similar to denote that it reflects the realisations of the observed data after the pseudo intervention. Then also label the predicted distribution appropriately to convey it is the predicted distribution for the "future" period. Similarly in the table the Naming of the functions In the how it works section i think you need to be clearer that you are using a time-series null model and introducing a pseudo treatment date. The idea being that the model should be able to fit a pre-period and predict well in the post-period. You're conducting this exercise with pseudo treatment dates as a prepatory robustness check on the model specification that you want to use in the actual experiment involving a genuine treatment or intervention. Maybe also make a note about why the choice of pseudo treatment is not arbitrary or how you can bootstrap this style of analysis with random pseudo treatment dates to provide the sensetivity check. Might have to look into block-bootstrap methods for this. Then outline the intended structure of the notebook i.e. we will first fit a basic model with a fixed pseudo treatment date showing the inadequacy of this model spec in pre-period data. Then we will improve the model and again evaluate using pseudo treatment date in the pre-period data. Once we're happy with this model we will apply the modelling exercise to the real treatment date. I would consider cutting the bias estimation part altogether. There is quite a bit going on that is hard to follow. Additionally i'd consider if you can use simulated data to help better sign post Generally i think i understood the process better this round, but the naming conventions (e.g. bayesian tail probability) are just too opaque and the notebook needs more sign posting. We're doing this step now because... etc. Much improved though. Will try and look at the code soon too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General or code comments
- Running
make check_lint
shows a number of style violations. Runningmake lint
should auto fix all this with ruff. - We need the new code in
pymc_experiments.py
to be covered by tests. Perhaps a good way of doing that is with an integration test (seetest_integratio_pymc_example.py
for examples). You could maybe base it around the same example you use for the illustrative example notebook. - As far as I can tell, we've got the two public methods
plot_power
andpower_summary
which call_power_estimation
. So I'm just double checking that the outputs of_power_estimation
are deterministic (given the existing stored mcmc samples) right? Can you just confirm that this is the case? Otherwise we might get slightly different results coming out ofplot_power
andpower_summary
.
Comments about the example notebook
- Introduction paragraph is nice. When you say "We have introduced a..." do you want any attribution there to yourself, or to a team who developed these ideas? Otherwise the wording could change to something like "We outline an approach..." and again, I think it could benefit from adding a reference to similar kinds of ideas or approaches that are out there.
- Clarity change: "The method involves creating a null model that does not capture any effect during a period where no effect is present" -> "The method involves creating a null model of the data before any intervention takes place."
- "By analyzing the posterior distribution derived from this null model" Can you be more specific... it's not the whole posterior right, so you could clarify if it's a summary statistic (if so, what).
- The "How it works" section is a great addition. But in the list, the samples and chains seems kind of boring and technical rather than consequential to doing an experiment. Maybe this section can be slightly tweaked to focus on the big important issues. As in, what can you as an experimenter gain from running a Bayesian power analysis before you run your experiment? Does it allow you to have more confidence in your conclusion? Do we need to talk about different kinds of errors (missing an actual effect, false alarm)?
- The "Similarities and Differences to Frequentist Methods" section is nice. But I think it would be really strong if a number of links to wikipedia (or references) are added.
- Not quite sure you're still getting the
Model.model property is deprecated
warnings. Having updated frommain
this shouldn't happen any more. Maybe the notebook wasn't re-run after doing that? - Where you say "the power analysis must be executed in a period of the same duration before the intervention", what do you mean by "same duration"? Are you just trying to say that we need to run the power analysis before any intervention?
- I still think there's some more clarity to be had about the mean vs the cumulative. In both cases we're referring to the causal impact (difference between model prediction and reality) right? Spelling that out could make it clearer for readers.
- At the end of the notebook, can you do the regular
results.summary
before you do the modified intervention summary (see note on the code about having this as a separate method). - Toward the end of the notebook I think the numbers you have in the text get a bit out of sync with the actual result values. The outcomes should be reproducible given the
seed
is provided, so can we update the values in the text?
Summary
So can we just summarise the core logic here and go through a few questions. What we did was basically:
- Fit a model on pre-intervention data. This model had poor predictive ability on hold-out data. So we...
- Fitted a better model (more predictors) on the same pre-intervention data. The power analysis shows a distribution of possible causal impact values that we'd expect to see in the absence of an intervention effect. Correct?
- We then ran the intervention, fitting a model on this full pre and post-intervention data.
I was expecting more of a comparison of the posterior observed causal effect with the predicted distribution of no effect from step (2). The idea being that we can maybe say that the observed effect falls way outside of what we'd expect from the control / no effect model from step (2). Is that right, or is this not how this works? Adding some clarification around this would help me and likely a lot of other readers :)
@@ -17,6 +17,7 @@ Synthetic Control | |||
notebooks/sc_skl.ipynb | |||
notebooks/sc_pymc_brexit.ipynb | |||
notebooks/geolift1.ipynb | |||
notebooks/power_analysis.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match up with the actual notebook name, so the docs won't render this page currently.
@@ -44,3 +50,45 @@ def test_round_num(): | |||
assert round_num(123.456, 5) == "123.46" | |||
assert round_num(123.456, 6) == "123.456" | |||
assert round_num(123.456, 7) == "123.456" | |||
|
|||
|
|||
def test_compute_bayesian_tail_probability(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got multiple different scenarios here, each of which should be a different test. But for each test, we can have multiple asserts, like the output is a scalar float, between 0 and 1, etc
else: | ||
probability = 2 * (1 - cdf_x + cdf_lower) / (1 - cdf_lower - cdf_upper) | ||
|
||
return abs(probability) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting some type hint issues with abs(probability)
.
@@ -48,3 +49,29 @@ def _format_sig_figs(value, default=None): | |||
if value == 0: | |||
return 1 | |||
return max(int(np.log10(np.abs(value))) + 1, default) | |||
|
|||
|
|||
def compute_bayesian_tail_probability(posterior, x) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add type hints for the function inputs.
|
||
results = {} | ||
ci = (alpha * 100) / 2 | ||
# Cumulative calculations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this comment to be more descriptive in terms of what it is doing?
.mu.stack(sample=("chain", "draw")) | ||
.sum("obs_ind") | ||
) | ||
# Mean calculations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this comment to be more descriptive in terms of what it is doing?
|
||
Returns | ||
------- | ||
- Dict: A dictionary containing key statistical measures such as posterior estimation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how pedantic I should be, but are we actually getting a nested dictionary out here?
np.percentile(_mu_samples_mean, 100 - ci), | ||
], | ||
} | ||
cumulative_upper_mde = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a comment here just saying that we are calculating Minimum Detectible Effect (bounds/intervals?).
"cumulative": cumulative_results, | ||
"mean": mean_results, | ||
} | ||
results["_systematic_differences"] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a quick comment to explain what this is doing
print(f"Formula: {self.formula}") | ||
# TODO: extra experiment specific outputs here | ||
self.print_coefficients() | ||
elif version == "intervention": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not 100% sure about this. At the moment we have a summary
method for all classes which provides an estimate of the model coefficients. The approach used here is to also call the summary
method but change what it does completely by changing a kwarg.
I'm thinking that it might be cleaner to completely separate this and just have a new method called power_analysis_summary
, or intervention_summary
, whatever.
I'm organizing the changes into a cleaner and more updated PR since it's been a while since I moved the last one.
Previous Context #277
Hey everyone!
As we discussed in the issue here, I've added a new function to CausalPy that performs power analysis. This is based on my internal work on estimating effects and conducting sensitivity analysis within Bayesian frameworks.
What will you find?
I hope this explanation makes sense. Ideally, I'd like to have a few rounds of feedback and answer any questions you may have before moving on to unit testing.
Looking forward to hearing your thoughts!
cc: @drbenvincent