This contains Python codes of our experiments on privacy level (for the entire dataset), example of data analysis (using genome statistics), and run time.
In the evaluation of privacy level, we compared our two methods (optimal solution that solves a linear programming problem and heuristic method) with the existing Kronecker product-based method [Wang et al., 2016], when the privacy level for each attribute information is given.
As an analysis example using our method, we considered the case where the privacy level for the entire dataset is fixed and evaluated the accuracy of output (differentially private)
The run time results show that our heuristic method can be performed in about
In Supplements.pdf, we provide a review of related studies, concrete descriptions of our methods, and experimental results and discussion on analysis example.
For example,
where
Note that
and
And then, the relations
hold. (For details, please refer to our paper.)
We may construct a linear programming problem for
However, the LP problem with variable reduction is often infeasible (cf. LP_variableReduction.ipynb). Moreover, our method is (currently) more efficient than solving the linear programming problem (cf. Cohen, Lee, and Song, 2021 and Brand, 2020); therefore, to efficiently construct a randomized response that achieves stronger privacy guarantees than the method using the Kronecker product, our method might be more useful (and is more interesting).
Of course, there may be cases where solving the linear programming problem would be superior, depending on the optimality, the acheved
Set the minimum privacy level to be guaranteed for each of the
- If
are all distributable, distribute as is. - If not distributable, prioritize reducing the budget for information that does not require a high degree of accuracy or that can be expected to be accurate even with a small
. (Distibute a budget of almost as much as possible for information that requires a high degree of accuracy.) - Check whether the updated budgets are distributable, using our methods each time.
- Ultimately, create a situation where privacy guarantees of
or better are achieved for all information.
Note that our methods are expected to distribute more privacy budgets than the existing Kronecker product-based method (as shown in our results of analysis example).
Although it is also possible to regard
・The required privacy level for each information should vary depending on the kinds of data and the analysis purpose. In the future, it would be desirable to evaluate not only the privacy level as in our experiments, but also the accuracy in more detail for each analysis method. Ultimately, it would be fascinating to develop an optimal randomized response in terms of the trade-off between privacy and utility. (Are there any cases where the distortion matrix
・Improving and theoretically analyzing the optimality of our heuristic method (especially when
・Developing better heuristic methods that can achieve the privacy levels that are as close to the input ones as possible. (We feel that this task is not be as easy as it may seem. Our proposed algorithm can achieve privacy levels closer to the input ones by changing the order of attributes, for example; however, a method that does not require such a procedure like parameter-tuning would be more desirable.) (Note that for the experiments in Privacy Level folder, we considered situations in which the achieved
・Developing optimal methods for multi-dimensional "numeric" data. (This study focused on categorical data.) ← One possible derection is to extend Duchi et al., 2018 and Wang et al., 2019's methods by combining them with our methods. As for Algorithm 4 in the paper of Wang et al., given that it can also handle categorical data, our proposed methods already have the potential to increase the privacy budget (
・Enhancing our methods under other concepts of differential privacy like One-Sided Differential Privacy [Kotsogiannis et al., 2020].
・Considering the Randomized Response techniques under
For details of our methods and discussion, please see our paper entitled "Privacy-Optimized Randomized Response for Sharing Multi-Attribute Data" (https://doi.org/10.1109/ISCC61673.2024.10733730, arXiv: http://arxiv.org/abs/2402.07584) presented at IEEE ISCC 2024.
Akito Yamamoto
Division of Medical Data Informatics, Human Genome Center,
the Institute of Medical Science, the University of Tokyo