Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
455 changes: 455 additions & 0 deletions Models/PateGAN/pateGAN.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions Models/PateGAN/readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

58 changes: 58 additions & 0 deletions pate-gan/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Python
__pycache__/
*.py[cod]
*.so
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# Virtual Environment
venv/
ENV/
.env

# IDE specific files
.idea/
.vscode/
*.swp
*.swo
.DS_Store

# Jupyter Notebook
.ipynb_checkpoints

# Logs and databases
*.log
*.sqlite
*.db

# Distribution/packaging
node_modules/
*.tar.gz
*.zip

# Local development settings
local_settings.py
*.local

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
coverage.xml
*.cover
63 changes: 63 additions & 0 deletions pate-gan/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Codebase for "PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees"

Authors: James Jordon, Jinsung Yoon, Mihaela van der Schaar

Reference: James Jordon, Jinsung Yoon, Mihaela van der Schaar,
"PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees,"
International Conference on Learning Representations (ICLR), 2019.

Paper Link: https://openreview.net/forum?id=S1zk9iRqF7

Contact: jsyoon0823@gmail.com

This directory contains implementations of PATEGAN framework for generating synthetic data.

To run the pipeline for training and evaluation on PATEGAN framwork, simply run
python3 -m main_pategan_experiment.py.

Note that hyper-parameter tuning is necessary for different datasets.

### Code explanation

(1) data_generator.py
- Generate train and test data to evaluate PATEGAN framework

(2) utils.py
- Define various supervised models such as logistic regression
- Return AUC and APR as the metrics

(3) pate_gan.py
- Main PATEGAN framework
- Return the synthetically generated data

(4) main_pategan_experiment.py
- Report the prediction performances of original data and synthetic data generated by PATEGAN.

### Command inputs:

- data_no: number of generated data
- data_dim: number of data dimensions
- noise_rate: noise ratio on data
- iterations: number of iterations for handling initialization randomness
- n_s: the number of student training iterations
- batch_size: the number of batch size for training student and generator
- k: the number of teachers
- epsilon: Differential privacy parameters (epsilon)
- delta: Differential privacy parameters (delta)
- lamda: PATE noise size

Note that hyper-parameters should be optimized for different datasets.

### Example command

```shell
$ python3 main_pategan_experiment.py --data_no 10000 --data_dim 10 --noise_rate 1.0
--iterations 50 --n_s 1 --batch_size 64 --k 100 --epsilon 100 --delta 0.0001
--lamda 1.0
```

### Outputs

- results: performances of Original and Synthetic performances
- train_data: original data
- synth_train_data: synthetically generated data
376,641 changes: 376,641 additions & 0 deletions pate-gan/c4_game_database.csv

Large diffs are not rendered by default.

Loading