elastic-surv is a library for training risk estimation models on ElasticSearch backends. Potential use cases include user churn prediction or survival probability.
- 🔑 Survival models include CoxPH, DeepHit or LogisticHazard(pycox).
- 🔥 ElasticSearch support using eland.
- 🌀 Automatic model selection using HyperBand.
Risk estimation tasks require:
- A set of covariates/features(
X
). - An outcome/event column(
Y
) - 0 means right censoring, 1 means that the event occured. - Time to event column(
T
) - the duration until the event or the censoring occured.
The risk estimation task output is a survival function: for N time horizons, it outputs the probability of "survival"(event not occurring) at each horizon.
For configuring the ELK stack, please follow the instructions here.
The library can be installed using
$ pip install .
For each ElasticSearch data backend, we need to mention:
- the es_index_pattern and the es_client for the ES connection.
- which keys in the ES index stand for the time-to-event and outcome data.
- optional: which features to include from the index.
from elastic_surv.dataset import ESDataset
from elastic_surv.models import CoxPHModel
dataset = ESDataset(
es_index_pattern = 'churn-prediction',
time_column = 'months_active',
event_column = 'churned',
es_client = "localhost",
)
model = CoxPHModel(in_features = dataset.features())
model.train(dataset)
model.score(dataset)
For this example, we use a local ES index, churn-prediction
. This can be generated using the following snippet
from pysurvival.datasets import Dataset
import eland as ed
raw_dataset = Dataset('churn').load()
ed.pandas_to_eland(raw_dataset,
es_client='localhost',
es_dest_index='churn-prediction',
es_if_exists='replace',
es_dropna=True,
es_refresh=True,
)
- Tutorial 1: Data backends
- Tutorial 2: Training a survival model over ElasticSearch
- Tutorial 3: AutoML for survival analysis over ElasticSearch
Install the testing dependencies using
pip install .[testing]
The tests can be executed using
pytest -vsx