AutoTSAD

Unsupervised Anomaly Detection System for Univariate Time Series.

Detecting anomalous subsequences in time series data is one of the most important tasks in time series analytics, having applications in environmental monitoring, preventive healthcare, predictive maintenance, and many further areas. Data scientists have developed various anomaly detection algorithms with individual strengths, such as the ability to detect repeating anomalies, anomalies in non-periodic time series, or anomalies with varying lengths. For a given dataset and task, the best algorithm with a suitable parameterization and, in some cases, sufficient training data, usually solves the anomaly detection problem well. However, given the high number of existing algorithms, their numerous parameters, and a pervasive lack of training data and domain knowledge, effective anomaly detection is still a complex task that heavily relies on manual experimentation and often, as experiments show, luck.

We propose the unsupervised AutoTSAD system, which parameterizes, executes, and ensembles various highly effective anomaly detection algorithms. The ensembling system automatically presents an aggregated anomaly scoring for an arbitrary time series without a need for training data or parameter expertise. Our experiments show that AutoTSAD offers an anomaly detection accuracy comparable to the best manually optimized anomaly detection algorithms, and can significantly outperform existing method selection and ensembling approaches for time series anomaly detection.

Architecture

AutoTSAD consists of the three modules Data Generation, Algorithm Optimization, and Scoring Ensembling. It takes a single univariate time series as input and produces a score ranking and an aggregated anomaly scoring. The score ranking can interactively be explored and altered.

Base Algorithms

Algorithm	Area	Family	Dim.	Lang.
STOMP	Data Mining	distance	uni	Python
k-Means	Classic ML	distance	multi	Python
Sub-KNN	Classic ML	distance	uni	Python
Sub-LOF	Outlier Det.	distance	uni	Python
Sub-IF	Outlier Det.	trees	uni	Python
GrammarViz	Data Mining	encoding	uni	Java
Torsk	Deep L.	forecasting	multi	Python
DWT-MLEAD	Signal A.	distribution	uni	Python

Repository Structure

Folder	Description
`autotsad`	AutoTSAD source code.
`autotsad.tsad_algorithms`	Base algorithm implementations for the ensemble.
`autotsad.baselines`	Implementation for the SELECT and tsadams baselines.
`data/autotsad-data`	Evaluation datasets.
`data/baseline-results`	Folder for the baseline results (just for the Oracle baseline for now).
...	tbd
`scripts`	Scripts to prepare the data, load them into the DB, and post-process some experimental results.
`requirements.txt`	Pip-dependencies required to run AutoTSAD.
`autotsad.yaml`	Configuration file template. Please find the configuration key documentation here.
`autotsad-exp-config.yaml`	AutoTSAD onfiguration used for the experiments.

Results

We compare the anomaly detection quality of AutoTSAD with five baselines on all 106 univariate time series in the data-folder. The baseline algorithms are the following:

Oracle: Perfect selection algorithm that magically selects the best performing algorithm for every time series from the 71 TimeEval-algorithms based on the Range-PR-AUC metric.
k-Means: Individual time series anomaly detection algorithm, which achieved overall best results and is the best of our base algorithms (see Base Algorithms).
SELECT (Horizontal and Vertical): Outlier ensembling technique that uses two different method selection strategies. We re-implemented this method in Python and use it on the same base algorithms as AutoTSAD.
tsadams: Method selection technique for time series anomaly detection. We use the implementation from the original authors. Because the method requires semi-supervised forecasting algorithms, we cannot use it on our base algorithms and use the provided ones.
cae-ensemble: Heterogeneous ensembling technique using deep learning base components (convolutional autoencoders). We use the implementation from the original authors and adapt it to our unsupervised use case:
- We use the test time series during training (without labels).
- We automatically execute the unsupervised hyperparameter selection process described in the paper with 10 randomly sampled hyperparameter settings (to stay within our 12 h time limit for most datasets).

For all baseline algorithms and AutoTSAD, we use the manually-tuned hyperparameter heuristics from TimeEval.

Range-PR-AUC Metric

We use the Range-PR-AUC metric as our main evaluation measure:

Other metrics

Installation

Requirements

python >=3.8, <3.10
Java >= 1.8 (for GrammarViz)

Installation from Source

We recommend to use conda or any other virtual environment management tool to create a new Python environment for AutoTSAD. Please make sure that python, pip, and java are accessible in your new environment.

Clone repository

git clone git@github.com:HPI-Information-Systems/AutoTSAD.git
cd AutoTSAD

(Create your environment and) install Python dependencies
```
pip install -r requirements.txt
```
Install AutoTSAD
```
pip install .
```

If you want to use the baseline tsadams or cae-ensemble, please a use git submodule update to fetch the required dependencies.

Usage

tl;dr

$ autotsad --help
usage: autotsad [-h] [--version] {completion,run,db,estimate-period} ...

Unsupervised anomaly detection system for univariate time series.

positional arguments:
  {completion,run,db,estimate-period}
    completion          Output shell completion script
    run                 Run AutoTSAD on a given dataset.
    db                  Manage AutoTSAD result database.
    estimate-period     Estimate the period size of a given time series dataset.

optional arguments:
  -h, --help            show this help message and exit
  --version             Show version number of AutoTSAD.

Example call:

$ autotsad run --config-path autotsad.yaml data/timeeval/GutenTAG/ecg-diff-count-1.csv

AutoTSAD v0.2.2
------------------------
CACHING directory=tmp/cache/6da004d6bd0cb6151622649862fcc418
RESULT directory=tmp/2023-10-17_15-50-20-6da004d6bd0cb6151622649862fcc418
Configuration=
AutoTSADConfig(
    general=GeneralSection(
        tmp_path=PosixPath('tmp'),
        result_path=PosixPath('tmp'),
        TIMESTAMP='2023-10-17_15-50-20',
        cache_key='6da004d6bd0cb6151622649862fcc418',
        logging_level=0,
        use_timer=True,
        timer_logging_level=20,
        progress=True,
        n_jobs=2,
        seed=2,
        max_algorithm_instances=6,
        algorithm_selection_method='aggregated-minimum-influence',
        score_normalization_method='minmax',
        score_aggregation_method='custom',
[...]

Configuration

All configuration options of AutoTSAD are managed in a configuration file that you specify before starting the system. AutoTSAD works fine with default options; please only change them if you know what you are doing.

You can specify all configuration settings (see configuration file) also via environment variables with the prefix AUTOTSAD. Examples:

Change the folder for temporary files and the cache config.general.tmp_path: AUTOTSAD__GENERAL__TMP_PATH=/tmp/custom-folder
Increase parallelism config.general.n_jobs: AUTOTSAD__GENERAL__N_JOBS=10
Disable hyperparameter optimization steps config.optimization.disabled: AUTOTSAD__OPTIMIZATION__DISABLED=true

Shell Completion

AutoTSAD comes with shell auto-completion scripts for bash and zsh. To enable them, run the following commands:

Bash:

autotsad completion bash > /etc/bash_completion.d/autotsad

Zsh:

autotsad completion zsh > /usr/local/share/zsh/site-functions/_autotsad

Zsh (with Oh-My-Zsh):

mkdir ~/.oh-my-zsh/completions
autotsad completion zsh > ~/.oh-my-zsh/completions/_autotsad

⚠️ Note that the auto-completions can noticably slow down your shell.

Reference

tbd

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.streamlit		.streamlit
autotsad		autotsad
cae-ensemble @ 45470c2		cae-ensemble @ 45470c2
data		data
db		db
docs/figures		docs/figures
paper-plots		paper-plots
scripts		scripts
tsadams @ 7c8ff59		tsadams @ 7c8ff59
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
autotsad-exp-config.yaml		autotsad-exp-config.yaml
autotsad-logo.png		autotsad-logo.png
autotsad.yaml		autotsad.yaml
environment.yaml		environment.yaml
requirements.txt		requirements.txt
run-cae-ensemble.sh		run-cae-ensemble.sh
run-select.sh		run-select.sh
run-tsadams.sh		run-tsadams.sh
run.sh		run.sh
setup.py		setup.py
slurm-batch-all-cae-ensemble.sh		slurm-batch-all-cae-ensemble.sh
slurm-batch-all-select.sh		slurm-batch-all-select.sh
slurm-batch-all-tsadams.sh		slurm-batch-all-tsadams.sh
slurm-batch-all.sh		slurm-batch-all.sh
slurm-run-cae-ensemble.sh		slurm-run-cae-ensemble.sh
slurm-run-select.sh		slurm-run-select.sh
slurm-run-tsadams.sh		slurm-run-tsadams.sh
slurm-run.sh		slurm-run.sh
testing.yaml		testing.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoTSAD

Architecture

Base Algorithms

Repository Structure

Results

Range-PR-AUC Metric

Other metrics

Installation

Requirements

Installation from Source

Usage

Configuration

Shell Completion

Reference

About

Releases

Languages

License

HPI-Information-Systems/AutoTSAD

Folders and files

Latest commit

History

Repository files navigation

AutoTSAD

Architecture

Base Algorithms

Repository Structure

Results

Range-PR-AUC Metric

Other metrics

Installation

Requirements

Installation from Source

Usage

Configuration

Shell Completion

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages