Skip to content

Latest commit

 

History

History
282 lines (232 loc) · 10.8 KB

README.md

File metadata and controls

282 lines (232 loc) · 10.8 KB

Discovering Leitmotifs in Multidimensional Time Series

This page was built in support of our paper "Discovering Leitmotifs in Multidimensional Time Series" by Patrick Schäfer and Ulf Leser.

A leitmotif is a recurring theme in literature, movies or music that carries symbolic significance for the piece it is contained in. When this piece can be represented as a multi-dimensional time series (MDTS), such as acoustic or visual observations, finding a leitmotif is equivalent to the pattern discovery problem, which is an unsupervised and complex problem in time series analytics. Compared to the univariate case, it carries additional complexity because patterns typically do not occur in all dimensions but only in a few - which are, however, unknown and must be detected by the method itself. In this paper, we present the novel, efficient and highly effective leitmotif discovery algorithm LAMA for MDTS. LAMA rests on two core principals: (a) a leitmotif manifests solely given a yet unknown number of sub-dimensions - neither too few, nor too many, and (b) the set of sub-dimensions are not independent from the best pattern found therein, necessitating both problems to be approached in a joint manner. In contrast to most previous methods, LAMA tackles both problems jointly - instead of first selecting dimensions (or leitmotifs) and then finding the best leitmotifs (or dimensions).

Supporting Material

  • tests: Please see the python tests for use cases
  • notebooks: Please see the Jupyter Notebooks for use cases
  • csvs: The results of the scalability experiments
  • leitmotifs: Code implementing multidimensonal leitmotif discovery using LAMA
  • datasets: Use cases in the paper

Mining Leitmotifs - Use Case

A leitmotif (leading motif) is a recurring theme or motif that carries symbolic significance in various forms of art, particularly literature, movies, and music. The distinct feature of any leitmotif is that humans associate them to meaning, which enhances narrative cohesion and establishes emotional connections with the audience. The use of (leit)motifs thus eases perception, interpretation, and identification with the underlying narrative. A genre that often uses leitmotifs are soundtracks, for instance in the compositions of Hans Zimmer or Howard Shore. The above figure shows a suite from The Shire with 14 channels arranged by Howard Shore for Lord of the Rings. The suite opens and ends with the Hobbits' leitmotif, which is played by a solo tin whistle, and manifests in a distinct pattern in several, but not all channels of the piece.

You may listen to the song on youtube - starting from second ~6, the same melody (leitmotif) is played by the tin whistle twice, and the this theme is repeated after 2 minutes into the song:

The Shire

The result of leitmotif discovery is shown next:

Our LAMA (in brown) is the only method to correctly identify 4 occurrences within the leitmotif using a distinctive subset of channels. Other than EMD*, LAMA's occurrences show high pairwise similarity, too.

Installation

The easiest is to use pip to install leitmotif.

a) Install using pip

pip install leitmotif

You can also install the project from source.

b) Build from Source

First, download the repository.

git clone https://github.com/patrickzib/leitmotifs.git

Change into the directory and build the package from source.

pip install .

Usage of LAMA

The three hyper-parameters of LAMA are:

  • n_dims : Number of subdimensions to use
  • k_max : The largest expected number of repeats. LAMA will search from to for motif sets
  • motif_length_range: The range of lengths to test

LAMA has a simple OO-API.

    ml = LAMA(
        ds_name,     # Name of the dataset
        series,      # Multidimensional time series
        distance,    # Distance measure used, default: z-normed ED
        n_dims,      # Number of sub-dimensions to use
        n_jobs,      # number of parallel jobs
    )

The result will look like this, indicating the found motif set to the right, its dimensions and the k locations to the bottom.

LAMA has a unique feature to automatically find suitable values for the motif length and set size so, that meaningful Leitmotifs of an input TS can be found without domain knowledge. The methods for determining values for $k$ and $l$ are based on an analysis of the extent function for different input value ranges.

Learning the Leitmotif length

To learn the motif length, we may simply call:

    ml.fit_motif_length(
        k_max,               # expected number of repeats
        motif_length_range,  # motif length range
        plot,                # Plot the results
        plot_elbows,         # Create an elbow plot 
        plot_motifsets,      # Plot the found motif sets
        plot_best_only       # Plot only the motif sets of the optimal length. Otherwise plot all local optima in lengths
    )

To do variable length motif discovery simply set plot_best_only=False

The generated plots looks like this, with good window lengths at local minima:

Learning the number of repeats

To do an elbow plot, and learn the number of repeats of the motif, we may simply call:

    ml.fit_k_elbow(
        k_max,                # expected number of repeats
        motif_length,         # motif length to use
        plot_elbows,          # Plot the elbow plot
        plot_motifsets        # Plot the found motif sets
    )

The generated plots looks like this, with good number of repeats at local minima:

Use Cases

Data Sets: We collected and annotated 14 challenging real-life data sets to assess the quality and scalability of the LAMA algorithm.

Ground leitmotifs were manually inferred. GT refers to the number of leitmotif occurrences.
Use Case Category Length Dim. GT
Charleston Motion Capture 506 93 3
Boxing Motion Capture 4840 93 10
Swordplay Motion Capture 2251 93 6
Basketball Motion Capture 721 93 5
LOTR - The Shire Soundtrack 6487 20 4
SW - The Imperial March Soundtrack 8015 20 5
RS - Paint it black Pop Music 9744 20 10
Linkin Park - Numb Pop Music 8018 20 5
Linkin P. - What I've Done Pop Music 8932 20 6
Queen - Under Pressure Pop Music 9305 20 16
Vanilla Ice - Ice Ice Baby Pop Music 11693 20 20
Starling Wildlife Rec. 2839 20 4
Physiodata (Physical Exercises) Wearable Sensors 5526 5 20
Bitcoin Halving Crypto/Stocks 3591 5 3

Aggregated Results

Method Precision, mean Precision, median Recall, mean Recall, median
EMD* 59.3 65 75.9 80
K-Motifs (TOP-f) 61.1 70 70.8 100
K-Motifs (all dims) 76.8 83.3 82.6 100
SMM 31.8 26.5 65.4 95
mSTAMP 53.8 100 36.7 20
mSTAMP+MDL 46.2 0 29.0 0
LAMA 88.7 100 95.1 100

See all results in Results Notebook.

Notebooks

  • Jupyter-Notebooks for finding subdimensional Leitmotifs in a multidimensional time series Multivariate Use Case: highlights a use case used in the paper, and shows the unique ability to learn its parameters from the data and find interesting motif sets.

  • Jupter-Notebook showcasing SMM-Results (SMM ran using Matlab): SMM-Results.

  • Jupter-Notebook showcasing using Leitmotifs for Summarization: Summarization.

  • Jupter-Notebook showcasing BitCoint-Halving Events: Bitcoin-Halving.

  • All other use cases presented in the paper can be found in the test folder

Citation

If you use this work, please cite as:

TODO