Tabular Data Synthesis for Mixed Data - Experiment Framework

A framework for the evaluation of different tabular data synthesis pipelines in their complete form. These include the following steps:

Preprocessing: Encoding categorical attributes and scaling numerical attributes.
Data Synthesis: Generating synthetic data with a generator trained on the preprocessed training data.
(Optional) Relabelling by a Black-Box Model: Relabelling the synthetic data with a black-box model trained on the preprocessed training data. Necessary if the generator does not label the data by itself.

This synthethic data will then be added to the original training data. The augmented training data will then be used to train a white-box model, for example a shallow decision tree. The resulting performance gain is then used as a proxy for augmentation quality.

This work was done as part of my Bachelor thesis "Benchmarking Tabular Data Synthesis Pipelines for Mixed Data".

Installation

The required packages can be installed through the 'requirements.txt' file. If you have pip installed you can simply run:

pip install -r requirements.txt

PrivBayes

The PrivBayes synthesizer requires dependencies written in C++ that need to be compiled before they can be used. Make sure to have installed all the necessary dependencies to compile C++. In Linux distributions based on Ubuntu, this can be done with the following command:

sudo apt-get install build-essential

Navigate to the location containing the makefile for the PrivBayes compilation ({repository_path}/framework/generators/privbayes). Trigger the compilation from this directory with the following command:

make compile

This compilation results in a privBayes.bin binary. To use it in the framework we have to add its path to the PRIVBAYES_BIN environment variable:

export PRIVBAYES_BIN={repository_path}/framework/generators/privbayes/privBayes.bin

Always check if the PRIVBAYES_BIN environment variable has been set before running a long experiment with the following command:

echo $PRIVBAYES_BIN

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
experiments		experiments
framework		framework
results		results
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tabular Data Synthesis for Mixed Data - Experiment Framework

Installation

PrivBayes

About

Uh oh!

Releases

Packages

Languages

License

thomas475/tabular-data-synthesis

Folders and files

Latest commit

History

Repository files navigation

Tabular Data Synthesis for Mixed Data - Experiment Framework

Installation

PrivBayes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages