Skip to content

v0.1 python-synhtpop

Choose a tag to compare

@jfparie jfparie released this 10 Mar 18:28
· 15 commits to main since this release
b09d3fe

python-synthpop v0.1 release summary 🚀

We are excited to announce the release of python-synthpop v0.1 – an open-source library for synthetic data generation (SDG). This release introduces robust implementations of Classification and Regression Trees (CART) and Gaussian Copula (GC) synthesizers, equipping users with an open-source python library to generate high-quality, privacy-preserving synthetic datasets.

Key Features in This Release:

  1. Missing data handling:

    • Users can decide whether missing data should be removed or imputed;
    • Users are guided on identifying the type of missing data in their dataset (e.g., missing at random or not at random) and advised on whether to handle it through removal or imputation based on best practices;
    • This ensures smooth handling of datasets with missing values.
  2. Preprocessing utilities:

    • Robust preprocessing functions for data normalization, transformation, and feature engineering to streamline the preparation of data before synthesis.
  3. Synthetic data generation methods:

    • CART-based synthesis: Create synthetic datasets that retain complex relationships in your data using decision trees;
    • Gaussian Copula synthesis: Leverage the power of copulas to capture and reproduce intricate dependencies between variables.
  4. Postprocessing functions:

    • Seamlessly map synthetic data back to its original structure and domain.
  5. Evaluation metrics:

    • Built-in tools to evaluate the quality of synthetic datasets:
      • Distributional similarity metrics.
      • Utility measures for downstream tasks (e.g., classification, regression).
      • Privacy-preserving metrics to assess disclosure risks.

Live demo in web app

A live demo of python-synthpop can be found in this local-first web app. In this architectural setup, data is processed entirely on your device and it not uploaded to any third-party, such as cloud providers. This computing approach is called local-first and allows organisations to securely use tools locally. Instructions how the tool can be hosted locally, incl. source code, can be found here.

Documentation and Support