v0.1.0-alpha1
Pre-release
Pre-release
This inaugural pre-alpha release introduces the core functionality of adversarial validation, exposed to the end user through the following method:
from advertion import validate
train = pd.read_csv("...") # let's say target variable is "label"
test = pd.read_csv("...")
are_similar = validate(
train=train,
test=test,
target="label",
)
# are_similar = True: train and test are following the same underlying distribution.
# are_similar = False: test dataset exhibits a different underlying distribution than train dataset.
At the same time:
- passing
smart=True
employs a pruning strategy of design matrix features based on feature importance - this helps remove featutes with strongly identifiable properties such as IDs, timestamps etc. - passing an
n_splits
value controls the number of cross-validation folds that take place internally. - passing
verbose=True
prints to the standard output informative messages on the adversarial validation strategy. - passing a
random_state
value ensures reproducible output across multiple function calls.