An automated and integrated tool for proving and disproving Differential Privacy using Programming Language techniques.
Due to the multiple dependencies of this project, it is much easier to install inside a docker container.
In the project root directory, run:
docker build . -t checkdp
docker run --rm -it checkdp
You will be inside a docker container with everything installed.
Specifically, KLEE
is installed at /checkdp/klee
, with its own dependencies (miniSAT
, stp
, llvm-9.0
) in /checkdp/klee/deps
. Counterexample validator PSI
is located at /checkdp/psi
.
Run python3 -m checkdp -h
you will see the following help message:
usage: __main__.py [-h] [-k KLEE] [-i INCLUDE] [-c CLANG] [-o OUT]
[-l LOGLEVEL] [-p PSI] [-s PSI_SOURCE] [-d]
[--transform-only] [--enable-shadow]
FILE
positional arguments:
FILE
optional arguments:
-h, --help show this help message and exit
-k KLEE, --klee KLEE The klee binary path
-i INCLUDE, --include INCLUDE
The include path for klee
-c CLANG, --clang CLANG
The clang binary path
-o OUT, --out OUT The output path for CheckDP
-l LOGLEVEL, --loglevel LOGLEVEL
The log level for the logger, could be one of {debug,
info, warning, error}
-p PSI, --psi PSI The path for psi binary
-s PSI_SOURCE, --psisource PSI_SOURCE
The source file for psi
-d, --deterministic Specifies whether to search deterministically (for
reproducibility) or not.
--transform-only Only generate the transformed template
--enable-shadow Controls whether shadow execution is used or not.
You don't need to specify -k / -i / -c / -p
parameters if you're using docker since the default values would be set.
Run python3 -m checkdp [FILE]
to start analyzing [FILE]
, the tool will generate a checkdp-out
folder containing
all intermediate files:
-
generate-[alignment|input]-[iteration].c
- The files sent to KLEE to find concrete values for symbolic [alignment | input], where [iteration] marks the current number of iterations. For better clarity, iteration number will be added 1000 for each restart. -
generate-[alignment|input]-[iteration].bc
- The bytecode file generated by compiling the source file usingCLANG
(specified by-c
, default to bundled clang in container) to be later sent to KLEE. -
run.log
- the log file containing all DEBUG output.
We have provided all examples studied in our paper in examples/
folder and a benchmark script. Simply run
python3 scripts/benchmark.py
to run checkdp on all of them at once, the corresponding results files will be stored in results/<algorithm_name>
.
Additionally, you can specify -s PSI_SOURCE
for counterexample validation if a counterexample is found by CheckDP.
Refer to psi repository for more details on how to write PSI programs. You may also take a look at PSI programs in examples/
folder. Notably, in order for CheckDP to properly insert corresponding concrete values for inputs and outputs, you should add placeholders such as $q$
and $T$
for the inputs (parameters of the algorithms), as well as $ARRAY(out)$
to mark the output array. Refer to examples/badgapsparsevector.psi
for example.
As discussed in the paper, shadow execution can be incorporated as an add-on and therefore controlled by the --enable-shadow
flag. By default it is turned off for better efficiency since enabling it introduces more overhead. For algorithms NoisyMax / BadNoisyMax / BadNoisyMaxVal
it must be turned on.
Most of the syntax in CheckDP is the same as C (though only a subset of C is supported). Also, we have some requirements of the implemented algorithms:
-
The first three parameters of the function must be
query, size, epsilon
with strict order. -
The first three lines at the beginning of the function must be C-strings marking the function specifications.
-
No return commands. Instead, CHECKDP_OUTPUT intrinsic should be used to mimic adding an element to the output array.
-
No function calls except noise generation, e.g.,
float eta = Lap(SCALE);
. Note that currently only laplace noiseLap(SCALE)
is supported.
Let's look at an example sparsevector
algorithm in examples/sparsevector.c
.
int sparsevector(float q[], int size, float epsilon, float T, int NN)
{
"TYPES: epsilon: <0, 0>; size: <0, 0>; q: <*, *>; T: <0, 0>; NN: <0, 0>";
"PRECONDITION: ALL_DIFFER; ASSUME(NN > 0); ASSUME(NN <= size); ASSUME(T >= -10); ASSUME(T <= 10);";
"CHECK: epsilon";
int out = 0;
float eta_1 = Lap(2 / epsilon);
int T_bar = T + eta_1;
int count = 0;
int i = 0;
while (count < NN && i < size)
{
float eta_2 = Lap(4 * NN / epsilon);
if (q[i] + eta_2 >= T_bar)
{
CHECKDP_OUTPUT(1);
count = count + 1;
}
else
{
CHECKDP_OUTPUT(0);
}
i = i + 1;
}
}
More details about the function specification annotations:
-
"TYPES: (VARIABLE_NAME: <ALIGNED_DISTANCE, SHADOW_DISTANCE>;)+"
This specifies the input distance types for the parameters, note that the types for all parameters must be specified otherwise an exception will be thrown. Shadow distances are ignored if
--enable-shadow
is not specified, however they must be given. If you do not use shadow execution, feel free to mark the shadow distances as*
. -
"PRECONDITION: (ALL_DIFFER|ONE_DIFFER); (ASSUME(EXPRESSION);)*"
This specifies the preconditions on the parameters, where
ALL_DIFFER
/ONE_DIFFER
is a macro for the query variable, indicating that any or only one of the query answers can differ by at most 1. Additionally, the assumptions for other parameters can be marked manually byASSUME(expression)
, whereexpression
can be arbitrary C-style expressions containing the parameters. Adding manual assumptions would reduce the search time and rule out invalid counterexamples. -
"CHECK: EXPRESSION"
This specifies the final goal to check. Most of the algorithms satisfies
epsilon
-differential privacy, however, some algorithms such asSmartSum
satisfies2*epsilon
-differential privacy. In that case you should write"CHECK: 2 * epsilon"
.
One thing to notice is that due to the limitations of KLEE
(only supporting integers), all float
declarations would be converted to int
, this may bring imprecisions. CheckDP will issue a warning whenever a conversion happens.
However, you can safely write expressions with divisions in Lap
, such as Lap(4 * NN / epsilon)
in SparseVector
algorithm. CheckDP will properly scale the cost variable calculations to eliminate the floating-point calculations.
MIT.