Skip to content

bcebere/elastic-surv

Repository files navigation

elastic-surv

Survival analysis on Big Data

elastic-surv Tests License

elastic-surv is a library for training risk estimation models on ElasticSearch backends. Potential use cases include user churn prediction or survival probability.

  • 🔑 Survival models include CoxPH, DeepHit or LogisticHazard(pycox).
  • 🔥 ElasticSearch support using eland.
  • 🌀 Automatic model selection using HyperBand.

Problem formulation

Risk estimation tasks require:

  • A set of covariates/features(X).
  • An outcome/event column(Y) - 0 means right censoring, 1 means that the event occured.
  • Time to event column(T) - the duration until the event or the censoring occured.

The risk estimation task output is a survival function: for N time horizons, it outputs the probability of "survival"(event not occurring) at each horizon.

Installation

For configuring the ELK stack, please follow the instructions here.

The library can be installed using

$ pip install .

Sample Usage

For each ElasticSearch data backend, we need to mention:

  • the es_index_pattern and the es_client for the ES connection.
  • which keys in the ES index stand for the time-to-event and outcome data.
  • optional: which features to include from the index.
from elastic_surv.dataset import ESDataset
from elastic_surv.models import CoxPHModel

dataset = ESDataset(
    es_index_pattern = 'churn-prediction',
    time_column = 'months_active',
    event_column = 'churned',
    es_client = "localhost",
)

model = CoxPHModel(in_features = dataset.features())
    
model.train(dataset)
model.score(dataset)

For this example, we use a local ES index, churn-prediction. This can be generated using the following snippet

from pysurvival.datasets import Dataset
import eland as ed

raw_dataset = Dataset('churn').load() 

ed.pandas_to_eland(raw_dataset,
                  es_client='localhost',
                  es_dest_index='churn-prediction',
                  es_if_exists='replace',
                  es_dropna=True,
                  es_refresh=True,
) 

Tutorials

Tests

Install the testing dependencies using

pip install .[testing]

The tests can be executed using

pytest -vsx