Skip to content

Commit

Permalink
Release v0.9.8
Browse files Browse the repository at this point in the history
  • Loading branch information
gherka committed Nov 20, 2023
1 parent cc6b673 commit f31bbd4
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 8 deletions.
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
## Release notes
---
### 0.9.8 (November 20, 2023)

##### Enhancements
- Experimental support for using SQL to generate anonymising sets of values. This feature is available for all column types except numerical.
- `make_distinct` custom action now works on date columns.
- You can now easily add a column with current date and time by using `'@sysdate'` as a derived column.
- Pseudo-CHI numbers can be now generated by passing `pseudo_chi` as anonymising set to UUID columns.
- Numerical column weights for categorical values are now optional. This should speed up the process of manually composing a specification.

##### Bug fixes
- Minor bugs fixed in `shift_distribution`, `make_outlier` and `make_distinct`.
- Fixed a bug in regex distribution where the target number of uniques wasn't respected.
- Fixed date column not being recognized if source data had missing values.

##### Package version upgrades
- Python version changes to 3.10
- Pandas updated to 2.x version

### 0.9.7 (November 15, 2022)

##### Enhancements
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ The goal of Exhibit is to make it easier to generate anonymised data at scale in
- Generate and manipulate missing data and timeseries
- Generate geo-spatial data using H3 hexes
- Augment your synthetic data with compiled machine learning models and custom functions
- Use SQL to generate conditional values based on external tables

---
### Installation:
Expand Down
4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ dependencies:
- numpy=1.25.2
- pandas=2.0.3
- pip=23.2.1
- pyarrow=11.0.0
- python=3.10.13
- pyyaml=6.0.1
- scipy=1.11.1
- shapely=2.0.1
- sql-metadata=2.9.0
- sqlalchemy=1.4.39
- pip:
- h3==3.7.6
- pyarrow==14.0.1
- sql-metadata==2.9.0
9 changes: 8 additions & 1 deletion exhibit/core/generate/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -568,7 +568,14 @@ def _generate_using_external_table(self, col_name, anon_set):
temp_result = []

for group_key, group_index in groups.items():
new_data = self.rng.choice(a=probas[group_key][0], p=probas[group_key][1], size=len(group_index))
# if the key is missing, then the SQL filtered out the data for that key
# having a COALESCE in SQL would fix it, but in case it's also missing,
# we try to catch this edge case in code as well.
try:
new_data = self.rng.choice(a=probas[group_key][0], p=probas[group_key][1], size=len(group_index))
except KeyError: #pragma: no cover
new_data = [np.nan] * len(group_index)

temp_result.append(pd.Series(data=new_data, index=group_index, name=col_name))

final_result = pd.concat(temp_result)
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ dill == 0.3.7
h3 == 3.7.6
numpy == 1.25.2
pandas == 2.0.3
pyarrow == 11.0.0
pyarrow == 14.0.1
pyyaml == 6.0.1
scipy == 1.11.1
shapely == 2.0.1
Expand Down
8 changes: 4 additions & 4 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = exhibit
version = 0.9.7
version = 0.9.8
author = German Priks
author_email = german.priks@pm.me
description = Command line tool to generate anonymised demonstrator data
Expand All @@ -9,7 +9,7 @@ long_description_content_type = text/markdown
url = https://github.com/gherka/exhibit
classifiers =
Programming Language :: Python :: 3
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.10
License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Operating System :: OS Independent
Development Status :: 4 - Beta
Expand All @@ -20,13 +20,13 @@ classifiers =
[options]
packages = find:
include_package_data = true
python_requires = ">=3.10"
python_requires = >=3.10
install_requires =
dill == 0.3.7
h3 == 3.7.6
numpy == 1.25.2
pandas == 2.0.3
pyarrow == 11.0.0
pyarrow == 14.0.1
pyyaml == 6.0.1
scipy == 1.11.1
shapely == 2.0.1
Expand Down

0 comments on commit f31bbd4

Please sign in to comment.