Consider the following scenario: You are an empirical social scientist who is going to run a field experiment, in order to test some hypothesis. You want to do the right thing. This means that you will not p-hack your results, and you will pre-register your statistical analysis. This way, you will control size of your test under the null. At the same time you would like to maximize the probability of rejecting your null hypothesis. You believe there is an effect there, and you want to prove it to the world. What should you do? This tutorial will show how to use the PAP App to answer this question. For further details, please refer to our working paper Optimal Pre-Analysis Plans: Statistical Decisions Subject to Implementability.
The easiest way to use the app is to go to the interactive app at The PAP App.
In this notebook, we will however instead use the Python script PAP_Optimal.py
to obtain the same results.
This script can be downloaded from https://github.com/maxkasy/The_PAP_App.
The following loads the functions from the Python script, as well as the required packages (numpy
, pandas
, scipy
, itertools
, and cvxpy
):
run PAP_Optimal
Next we need to specify the parameters of our problem, which will then determine the optimal PAP. These parameters pin down (1) the null hypothesis to be tested, (2) prior beliefs over the true parameter, and (3) prior beliefs over the availability of specific tests or estimates for the data that you will obtain.
In the app, we have implemented two leading scenarios.
The first scenario involves "data" in the form of binary decisions
Suppose you plan to run an experiment at three different sites, for instance in three different villages.
In the experiment you have a binary treatment, and a main outcome of interest.
You will calculate a test for each village separately, reporting whether the treatment had a significant positive effect in this village. This gives three testing decisions
However, there is some chance that the experiment will not be implementable in a given village. That is, there is a chance that you will not be able to calculate etaJ
in the code below).
Under the null of no effect, the probability of rejecting the null in each of the villages is given by minp
below), where
Lastly, we need to make assumptions about your prior distribution for the rejection probability
input_binary = {
'etaJ': np.array([.9, .7, .5]),
'minp': .05,
'alpha': 1,
'beta': 1
}
It turns out these parameters are all that we need to calculate the optimal PAP. The function pap_binary_data
takes the dictonary input_binary
, and uses it to calculate the relevant distributions of data availability and test realizations under the null, as well as averaged over the prior distributions:
test_args_binary = pap_binary_data(input_binary)
We can now calculate the optimal test, based on these distributions. This optimal test solves the linear programming problem described in our paper, maximizing expected power subject to size control and incentive compatibility. The Argument size
refers to the desired size of the joint hypothesis test.
solution_binary = optimal_test(test_args_binary, size = .05)
Let us look at the optimal pre-analysis plan that we have calculated:
t_bin = solution_binary["t"]
t_bin[t_bin["t"]>0].sort_values(by = ["t", "X1", "X2"])
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
X1 | X2 | X3 | t | |
---|---|---|---|---|
4 | 1.0 | 0.0 | 0.0 | 0.947 |
3 | 0.0 | 1.0 | 1.0 | 1.000 |
5 | 1.0 | 0.0 | 1.0 | 1.000 |
6 | 1.0 | 1.0 | 0.0 | 1.000 |
7 | 1.0 | 1.0 | 1.0 | 1.000 |
To save this pre-analysis plan in a spreadsheet, run the following:
t_bin.to_csv("optimal_pap_binary.csv", index=False)
We now have a spreadsheet, as displayed in the table above, of rejection probabilities for different combinations of the $X$s. How do we use this spreadsheet?
The key point to remember is that we need to make worst-case assumptions about any components which are not available in the data that we actually obtain, after pre-registering our PAP.
For example, suppose you registered the above testing rule in a PAP, before seeing the data.
Suppose further that, based on your data, you can calculate the test
Suppose additionally that
The discussion above was based on binary data, motivated by the case where we calculate a joint testing decision based on multiple individual hypothesis tests. An alternative, which uses the data more efficiently, is to calculate a joint testing decision based on multiple individual point estimates. Motivated by standard asymptotic arguments, we assume that these point estimates are normally distributed.
As before, etaJ
refers to the probabilities that each of the components will be available.
The parameters mu0
and Sigma0
) specify the mean vector and covariance matrix of mu
and Sigma
) specify the prior marginal distribution of
input_normal = {
'etaJ': np.array([.9, .5]),
'mu0': np.array([0, 0]),
'Sigma0': np.array([[1, 0], [0, 1]]),
'mu': np.array([1, 1]),
'Sigma': np.array([[2, 1], [1, 2]])
}
The remainder of the calculation proceeds as before.
To apply the proposed linear programming approach, we need to discretize the support of the vector pap_normal_data
.
test_args_normal = pap_normal_data(input_normal)
solution_normal = optimal_test(test_args_normal, size = .05)
t_norm = solution_normal["t"]
# t_norm[t_norm["t"]>0].sort_values(by = ["t", "X1", "X2"])
t_norm.to_csv("optimal_pap_normal.csv", index=False)