SeqLab

Welcome to SeqLab! This project provides a comprehensive framework for training and evaluating various machine learning models, focusing on multi-feature sequential categorical data.

Introduction

SeqLab is engineered to facilitate systematic experimentation and benchmarking of machine learning models. Utilizing a configuration-driven approach, researchers and practitioners can specify their experimental setups through a JSON file, ensuring reproducibility and flexibility. The project integrates seamlessly with MLflow, providing robust tools for experiment tracking and model management.

SeqLab is optimized for training models that perform sequence modeling and next-step prediction. For example, consider a sequence of musical chords:

A:min E:min F:maj G:maj A:min C:maj G:maj

SeqLab enables the development of models that learn from such sequences and predict the subsequent chord in the progression. This capability is essential for applications in areas such as music generation and sequence prediction in natural language processing.

Key Features

Multiple Model Support
- Markov
- Variable-Order Markov
- LSTM
- LSTM with Attention
- Transformer
- GPT
Multi-feature Sequential Categorical Data Handling
Automated Hyperparameter Optimization with Optuna
Experiment Tracking with MLflow

Getting Started

Installation

To get started, set up a virtual environment with Python 3.11 and install the necessary dependencies:

Set up the virtual environment:

python3.11 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

Use the provided example configuration file, which includes all available models and hyperparameters. You can customize this example by selecting the configurations that interest you and copying them into your config.json file: example-config.json.

Data Representation

Seqlab accepts data in two formats:

TXT Format:
- Ideal for single feature/dimension data.
- Each sequence should be represented in a row with space-separated values.
- Example:
```
A B C D E
F G H I J
```
CSV Format:
- Supports multiple features/dimensions.
- Features are tab-separated, with sequences separated by rows containing the >* symbol.
- The first line should contain feature names, with each subsequent row representing an event in time. The rows between the separator rows (>*) represent sequences.
- Example:
```
feature1    feature2    feature3
A           1           x
B           2           y
C           3           z
>*          >*          >*
D           4           u
E           5           v
F           6           w
>*          >*          >*
```

Configuration

After preparing your data, place it in a designated folder (e.g., data folder) and add its path to the list of datasets in the experiment configuration file. Next, configure the following settings:

Number of splits for k-fold cross-validation: Default is 7.
Number of trials for model fine-tuning: Default is 20.

Running the Experiment

To start the experiment, execute the following command:

python run.py

Monitoring the Experiment

To monitor the experiment process, start the MLflow UI in another terminal:

mlflow ui --port=4000

Then, navigate to 127.0.0.1:4000 in your web browser to access the MLflow tracking UI.

Figure: Visualizing experiment tracking with MLflow in SeqLab. Each experiment set is named after its dimensionality and contains multiple models. Each model is evaluated using different folds of data, with multiple trials per fold to optimize hyperparameters. The MLflow UI stores metrics, evaluation results, and important experiment tags for each run, allowing detailed analysis and comparison of model performance.

Full Documentation

For detailed information on using SeqLab, please refer to the following sections in the documentation:

Citation

If SeqLab contributes to your research, we kindly request that you cite the following publication:

@article{jafari2024striking,
  title={Striking a New Chord: Neural Networks in Music Information Dynamics},
  author={Jafari, Farshad and Arthur, Claire},
  journal={arXiv preprint arXiv:2410.17989},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
data		data
docs		docs
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
example-config.json		example-config.json
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeqLab

Table of Contents

Introduction

Key Features

Getting Started

Installation

Data Representation

Configuration

Running the Experiment

Monitoring the Experiment

Full Documentation

Citation

About

Languages

License

frshdjfry/SeqLab

Folders and files

Latest commit

History

Repository files navigation

SeqLab

Table of Contents

Introduction

Key Features

Getting Started

Installation

Data Representation

Configuration

Running the Experiment

Monitoring the Experiment

Full Documentation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages