Skip to content

Releases: NGO-Algorithm-Audit/python-synthpop

Release v0.1.2

25 May 10:38

Choose a tag to compare

  • fix bug in function data_processor.py/_decode_categorical. The issue being that the GC can simulate out of bounds value that need to be capped to the maximum number of classes for the categorical variable.

Release v0.1.1

14 Mar 15:17
d54cab6

Choose a tag to compare

python-synthpop v0.1.1 release summary

  • fix bug in efficacy_metrics.py

v0.1 python-synhtpop

10 Mar 18:28
b09d3fe

Choose a tag to compare

python-synthpop v0.1 release summary 🚀

We are excited to announce the release of python-synthpop v0.1 – an open-source library for synthetic data generation (SDG). This release introduces robust implementations of Classification and Regression Trees (CART) and Gaussian Copula (GC) synthesizers, equipping users with an open-source python library to generate high-quality, privacy-preserving synthetic datasets.

Key Features in This Release:

  1. Missing data handling:

    • Users can decide whether missing data should be removed or imputed;
    • Users are guided on identifying the type of missing data in their dataset (e.g., missing at random or not at random) and advised on whether to handle it through removal or imputation based on best practices;
    • This ensures smooth handling of datasets with missing values.
  2. Preprocessing utilities:

    • Robust preprocessing functions for data normalization, transformation, and feature engineering to streamline the preparation of data before synthesis.
  3. Synthetic data generation methods:

    • CART-based synthesis: Create synthetic datasets that retain complex relationships in your data using decision trees;
    • Gaussian Copula synthesis: Leverage the power of copulas to capture and reproduce intricate dependencies between variables.
  4. Postprocessing functions:

    • Seamlessly map synthetic data back to its original structure and domain.
  5. Evaluation metrics:

    • Built-in tools to evaluate the quality of synthetic datasets:
      • Distributional similarity metrics.
      • Utility measures for downstream tasks (e.g., classification, regression).
      • Privacy-preserving metrics to assess disclosure risks.

Live demo in web app

A live demo of python-synthpop can be found in this local-first web app. In this architectural setup, data is processed entirely on your device and it not uploaded to any third-party, such as cloud providers. This computing approach is called local-first and allows organisations to securely use tools locally. Instructions how the tool can be hosted locally, incl. source code, can be found here.

Documentation and Support

v0.0.9

04 Mar 08:32
3c17446

Choose a tag to compare

Update pyproject.toml

v0.0.8

04 Mar 08:22
137f156

Choose a tag to compare

Update pyproject.toml

v0.0.7

04 Mar 08:19
1752e6f

Choose a tag to compare

Update pyproject.toml

v0.0.6

04 Mar 08:00
1698203

Choose a tag to compare

Update setup.py

v0.0.5

04 Mar 07:55
171656c

Choose a tag to compare

Update publish.yml

v0.0.4

04 Mar 07:49
ba75d25

Choose a tag to compare

Update pyproject.toml

v0.0.3

03 Mar 20:56
fc937ca

Choose a tag to compare

Update publish.yml