FiLM-Benchmark

FiLM-Benchmark is a comprehensive benchmarking pipeline designed to evaluate the performance of language models on financial data. It focuses on various financial tasks such as credit scoring, sentiment analysis, and more. The system is modular and easily extensible, making it straightforward to add new tasks and models. The goal of this project is to provide a standardized and automated benchmarking process that can serve both researchers and practitioners in the financial domain.

Features

Comprehensive Benchmarking: Supports over 10 tasks including sentiment analysis, credit scoring, and unit classification.
Modular Design: Easily add new tasks by creating specific data preparation and testing scripts.
Automated Process: The pipeline prepares data, fine-tunes models, runs benchmarks, and evaluates performance automatically.
Metrics and Reporting: Outputs performance metrics like F1 score, accuracy, and more.
Parallel Execution: Supports parallel processing to speed up benchmarking.

Installation

To install the required dependencies, run:

pip install -r requirements.txt

Usage

Preparing Data

Data preparation involves fetching and preprocessing the raw data. The pipeline uses specific data preparation scripts for each task.

python run_benchmark.py --task sentiment_analysis --model_name distilbert-base-uncased

Running Benchmarks

Run the benchmarking process for a specific task and model using the following command:

python run_benchmark.py --task sentiment_analysis --model_name distilbert-base-uncased

This will prepare the data, fine-tune the model, and run the benchmark.

Evaluating Models

The evaluation results will be printed to the console and saved in the results directory. Metrics include accuracy, precision, recall, and F1 score.

Configuration

The pipeline can be configured through command-line arguments:

--task: Specifies the task to run (e.g., sentiment_analysis, credit_scoring).
--model_name: Specifies the name of the model to use (e.g., distilbert-base-uncased).
--max_length: Maximum sequence length for the tokenizer.
--batch_size: Batch size for training and evaluation.

Example:

python run_benchmark.py --task sentiment_analysis --model_name distilbert-base-uncased --max_length 128 --batch_size 32 --epochs 3

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
benchmarks		benchmarks
data		data
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
_init_.py		_init_.py
b.py		b.py
calculate_tokens.py		calculate_tokens.py
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
run_llm_benchmark.py		run_llm_benchmark.py
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FiLM-Benchmark

Table of Contents

Features

Installation

Usage

Preparing Data

Running Benchmarks

Evaluating Models

Configuration

About

Releases

Packages

Languages

Niklauseik/FiLM-Benchmark

Folders and files

Latest commit

History

Repository files navigation

FiLM-Benchmark

Table of Contents

Features

Installation

Usage

Preparing Data

Running Benchmarks

Evaluating Models

Configuration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages