Skip to content

Unsupervised Anomaly Detection System for Univariate Time Series

License

Notifications You must be signed in to change notification settings

HPI-Information-Systems/AutoTSAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoTSAD logo

AutoTSAD

Unsupervised Anomaly Detection System for Univariate Time Series.

License: MIT python version 3.8|3.9


Detecting anomalous subsequences in time series data is one of the most important tasks in time series analytics, having applications in environmental monitoring, preventive healthcare, predictive maintenance, and many further areas. Data scientists have developed various anomaly detection algorithms with individual strengths, such as the ability to detect repeating anomalies, anomalies in non-periodic time series, or anomalies with varying lengths. For a given dataset and task, the best algorithm with a suitable parameterization and, in some cases, sufficient training data, usually solves the anomaly detection problem well. However, given the high number of existing algorithms, their numerous parameters, and a pervasive lack of training data and domain knowledge, effective anomaly detection is still a complex task that heavily relies on manual experimentation and often, as experiments show, luck.

We propose the unsupervised AutoTSAD system, which parameterizes, executes, and ensembles various highly effective anomaly detection algorithms. The ensembling system automatically presents an aggregated anomaly scoring for an arbitrary time series without a need for training data or parameter expertise. Our experiments show that AutoTSAD offers an anomaly detection accuracy comparable to the best manually optimized anomaly detection algorithms, and can significantly outperform existing method selection and ensembling approaches for time series anomaly detection.

Architecture

AutoTSAD consists of the three modules Data Generation, Algorithm Optimization, and Scoring Ensembling. It takes a single univariate time series as input and produces a score ranking and an aggregated anomaly scoring. The score ranking can interactively be explored and altered.

AutoTSAD architecture

Base Algorithms

Algorithm Area Family Dim. Lang.
STOMP Data Mining distance uni Python
k-Means Classic ML distance multi Python
Sub-KNN Classic ML distance uni Python
Sub-LOF Outlier Det. distance uni Python
Sub-IF Outlier Det. trees uni Python
GrammarViz Data Mining encoding uni Java
Torsk Deep L. forecasting multi Python
DWT-MLEAD Signal A. distribution uni Python

Repository Structure

Folder Description
autotsad AutoTSAD source code.
autotsad.tsad_algorithms Base algorithm implementations for the ensemble.
autotsad.baselines Implementation for the SELECT and tsadams baselines.
data/autotsad-data Evaluation datasets.
data/baseline-results Folder for the baseline results (just for the Oracle baseline for now).
... tbd
scripts Scripts to prepare the data, load them into the DB, and post-process some experimental results.
requirements.txt Pip-dependencies required to run AutoTSAD.
autotsad.yaml Configuration file template. Please find the configuration key documentation here.
autotsad-exp-config.yaml AutoTSAD onfiguration used for the experiments.

Results

We compare the anomaly detection quality of AutoTSAD with five baselines on all 106 univariate time series in the data-folder. The baseline algorithms are the following:

  • Oracle: Perfect selection algorithm that magically selects the best performing algorithm for every time series from the 71 TimeEval-algorithms based on the Range-PR-AUC metric.
  • k-Means: Individual time series anomaly detection algorithm, which achieved overall best results and is the best of our base algorithms (see Base Algorithms).
  • SELECT (Horizontal and Vertical): Outlier ensembling technique that uses two different method selection strategies. We re-implemented this method in Python and use it on the same base algorithms as AutoTSAD.
  • tsadams: Method selection technique for time series anomaly detection. We use the implementation from the original authors. Because the method requires semi-supervised forecasting algorithms, we cannot use it on our base algorithms and use the provided ones.
  • cae-ensemble: Heterogeneous ensembling technique using deep learning base components (convolutional autoencoders). We use the implementation from the original authors and adapt it to our unsupervised use case:
    • We use the test time series during training (without labels).
    • We automatically execute the unsupervised hyperparameter selection process described in the paper with 10 randomly sampled hyperparameter settings (to stay within our 12 h time limit for most datasets).

For all baseline algorithms and AutoTSAD, we use the manually-tuned hyperparameter heuristics from TimeEval.

Range-PR-AUC Metric

We use the Range-PR-AUC metric as our main evaluation measure:

Detection quality comparison using Range-PR-AUC metric

Other metrics

Detection quality comparison using Range-PR-AUC metric Detection quality comparison using Range-PR-VUS metric Detection quality comparison using PR-AUC metric

Detection quality comparison using Range-ROC-AUC metric Detection quality comparison using Range-ROC-VUS metric Detection quality comparison using ROC-AUC metric

Installation

Requirements

  • python >=3.8, <3.10
  • Java >= 1.8 (for GrammarViz)

Installation from Source

We recommend to use conda or any other virtual environment management tool to create a new Python environment for AutoTSAD. Please make sure that python, pip, and java are accessible in your new environment.

  1. Clone repository

    git clone git@github.com:HPI-Information-Systems/AutoTSAD.git
    cd AutoTSAD
  2. (Create your environment and) install Python dependencies

    pip install -r requirements.txt
  3. Install AutoTSAD

    pip install .

If you want to use the baseline tsadams or cae-ensemble, please a use git submodule update to fetch the required dependencies.

Usage

tl;dr

$ autotsad --help
usage: autotsad [-h] [--version] {completion,run,db,estimate-period} ...

Unsupervised anomaly detection system for univariate time series.

positional arguments:
  {completion,run,db,estimate-period}
    completion          Output shell completion script
    run                 Run AutoTSAD on a given dataset.
    db                  Manage AutoTSAD result database.
    estimate-period     Estimate the period size of a given time series dataset.

optional arguments:
  -h, --help            show this help message and exit
  --version             Show version number of AutoTSAD.

Example call:

$ autotsad run --config-path autotsad.yaml data/timeeval/GutenTAG/ecg-diff-count-1.csv

AutoTSAD v0.2.2
------------------------
CACHING directory=tmp/cache/6da004d6bd0cb6151622649862fcc418
RESULT directory=tmp/2023-10-17_15-50-20-6da004d6bd0cb6151622649862fcc418
Configuration=
AutoTSADConfig(
    general=GeneralSection(
        tmp_path=PosixPath('tmp'),
        result_path=PosixPath('tmp'),
        TIMESTAMP='2023-10-17_15-50-20',
        cache_key='6da004d6bd0cb6151622649862fcc418',
        logging_level=0,
        use_timer=True,
        timer_logging_level=20,
        progress=True,
        n_jobs=2,
        seed=2,
        max_algorithm_instances=6,
        algorithm_selection_method='aggregated-minimum-influence',
        score_normalization_method='minmax',
        score_aggregation_method='custom',
[...]

Configuration

All configuration options of AutoTSAD are managed in a configuration file that you specify before starting the system. AutoTSAD works fine with default options; please only change them if you know what you are doing.

You can specify all configuration settings (see configuration file) also via environment variables with the prefix AUTOTSAD. Examples:

  • Change the folder for temporary files and the cache config.general.tmp_path: AUTOTSAD__GENERAL__TMP_PATH=/tmp/custom-folder
  • Increase parallelism config.general.n_jobs: AUTOTSAD__GENERAL__N_JOBS=10
  • Disable hyperparameter optimization steps config.optimization.disabled: AUTOTSAD__OPTIMIZATION__DISABLED=true

Shell Completion

AutoTSAD comes with shell auto-completion scripts for bash and zsh. To enable them, run the following commands:

  • Bash:

    autotsad completion bash > /etc/bash_completion.d/autotsad
  • Zsh:

    autotsad completion zsh > /usr/local/share/zsh/site-functions/_autotsad
  • Zsh (with Oh-My-Zsh):

    mkdir ~/.oh-my-zsh/completions
    autotsad completion zsh > ~/.oh-my-zsh/completions/_autotsad

⚠️ Note that the auto-completions can noticably slow down your shell.

Reference

tbd