Skip to content

v0.1.0-alpha1

Pre-release
Pre-release
Compare
Choose a tag to compare
@ilias-ant ilias-ant released this 16 Jul 08:51
· 22 commits to main since this release
1a30431

This inaugural pre-alpha release introduces the core functionality of adversarial validation, exposed to the end user through the following method:

from advertion import validate

train = pd.read_csv("...")   # let's say target variable is "label"
test = pd.read_csv("...")

are_similar = validate(
    train=train,
    test=test,
    target="label",
)
# are_similar = True: train and test are following the same underlying distribution.
# are_similar = False: test dataset exhibits a different underlying distribution than train dataset.

At the same time:

  • passing smart=True employs a pruning strategy of design matrix features based on feature importance - this helps remove featutes with strongly identifiable properties such as IDs, timestamps etc.
  • passing an n_splits value controls the number of cross-validation folds that take place internally.
  • passing verbose=True prints to the standard output informative messages on the adversarial validation strategy.
  • passing a random_state value ensures reproducible output across multiple function calls.