Multi-Agent Reinforcement Learning (MARL) for Capture The Flag (CTF)

This project implements Multi-Agent Reinforcement Learning (MARL) algorithms for the Capture the Flag (CTF) game environment. The project evaluates different approaches like Independent Q-Learning (IQL), Deep Q-Network (DQN), and Multi-Agent Proximal Policy Optimization (MAPPO) for cooperative and competitive agent interactions in a grid-world environment.

Introduction

This repository aims to solve the Capture The Flag problem using Multi-Agent Reinforcement Learning (MARL). The agents are trained to either capture the opponent's flag or defend their own flag in a dynamic environment. The project compares different RL algorithms to identify the most effective approach for this type of cooperative and competitive task.

Project Structure

env.py: The changed environment code for the CTF problem where agents take actions and the game progresses.
env2.py: The original environment with additional obstacles and variations in agent movements and flag interactions.
train_iql.py: Code for training the Independent Q-Learning (IQL) agents, where each agent independently learns and updates its Q-table.
train_dqn.py: Code for training the Deep Q-Network (DQN), a deep learning model for value-based reinforcement learning.
train_mappo.py: Code for training Multi-Agent Proximal Policy Optimization (MAPPO) agents, which is a state-of-the-art policy gradient method for MARL.
test_common.py: A unified testing script for evaluating IQL and MAPPO models, recording their performance, and saving test logs for further analysis.
requirements.txt: Lists the Python dependencies required to run the project.

Algorithms

Independent Q-Learning (IQL)

Independent Q-Learning (IQL) is a simple reinforcement learning technique where each agent learns independently and updates its Q-table based on its experiences without considering other agents' actions.

Q-value Update: Each agent maintains a Q-table with states and actions, and updates its Q-values based on rewards received and the next state.

[ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right] ]
Training: Agents explore the environment, learn from the rewards, and update the Q-table.

Key Considerations:

IQL is easy to implement and serves as a good baseline for comparing other RL algorithms.
It can be inefficient in a multi-agent environment since each agent is learning independently, and no cooperation is directly learned between agents.

Deep Q-Network (DQN)

Deep Q-Network (DQN) is an extension of Q-learning, where the Q-values are approximated using a neural network instead of a Q-table. The network takes the current state as input and outputs Q-values for each action.

Network Architecture: DQN uses a neural network to approximate the Q-function.

[ Q(s, a; \theta) \approx Q(s, a) ]
Experience Replay: DQN uses a replay buffer to store and sample past experiences to break correlations between consecutive updates.
Target Network: DQN uses a target network to stabilize training, where the target network is periodically updated with the weights of the main network.

Multi-Agent Proximal Policy Optimization (MAPPO)

MAPPO is an extension of Proximal Policy Optimization (PPO) for multi-agent settings. It uses a centralized training framework but decentralized execution, allowing agents to learn joint policies while acting independently during inference.

Centralized Training, Decentralized Execution: In MAPPO, agents are trained together in a shared environment, and each agent maintains its own policy, but they act independently during execution.
Policy Update: MAPPO uses a clipped objective function to prevent large updates and maintain stable learning.

[ L(\theta) = \mathbb{E} \left[ \min \left( r_t(\theta) A, \text{clip}(r_t(\theta), 1 - \epsilon, 1 + \epsilon) A \right) \right] ]

where ( r_t(\theta) ) is the probability ratio and ( A ) is the advantage function.

Installation

Clone the repository:

git clone https://github.com/Shirish2004/MARL-Project.git

Navigate to the project directory:
```
cd MARL-Project
```
Install the dependencies:
```
pip install -r requirements.txt
```

Usage

Training Models

Training IQL: To train the IQL model, run the following command:
```
python train_iql.py
```
This will start training the IQL model on the environment.
Training DQN: To train the DQN model, run:
```
python train_dqn.py
```
Training MAPPO: To train the MAPPO model, run:
```
python train_mappo.py
```

Testing Models

To test and compare the performance of IQL and MAPPO, run:

python test_common.py

This will:

Evaluate both IQL and MAPPO on the environment.
Record the performance metrics (win rates, average scores, etc.).
Save the test logs for further analysis.

Metrics Recording and Evaluation

The test_common.py script records the following performance metrics:

Win Rate: Percentage of episodes won by each team.
Average Score: Average score for each team.
Score Difference: Difference in scores between the two teams.

The metrics are saved in CSV format and visualized using matplotlib.

Example of Saved Metrics:

Win rates for both teams
Average scores for each team over multiple test episodes
Visualizations of agent movements and scores during testing

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
assets		assets
logs		logs
test_logs_iql2		test_logs_iql2
test_logs_iql_change		test_logs_iql_change
test_logs_mappo		test_logs_mappo
test_logs_mappo2		test_logs_mappo2
test_logs_mappo_cahnge		test_logs_mappo_cahnge
wandb		wandb
MARL_End_Sem.pdf		MARL_End_Sem.pdf
README.md		README.md
REPORT.pdf		REPORT.pdf
__init__.py		__init__.py
compare_metrics.py		compare_metrics.py
env.py		env.py
env2.py		env2.py
mappo_agent.py		mappo_agent.py
mappo_policy.py		mappo_policy.py
requirements.txt		requirements.txt
rewards_and_steps.png		rewards_and_steps.png
test.py		test.py
test_common.py		test_common.py
test_mappo.py		test_mappo.py
train.py		train.py
train_dqn.py		train_dqn.py
train_iql.py		train_iql.py
train_mappo.py		train_mappo.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Reinforcement Learning (MARL) for Capture The Flag (CTF)

Table of Contents

Introduction

Project Structure

Algorithms

Independent Q-Learning (IQL)

Key Considerations:

Deep Q-Network (DQN)

Multi-Agent Proximal Policy Optimization (MAPPO)

Installation

Usage

Training Models

Testing Models

Metrics Recording and Evaluation

Example of Saved Metrics:

Orignal Environment

Changed Environment

Train Results For IQL

Train Results For MAPPO

Test Results IQL

Test Results MAPPO

About

Releases

Packages

Languages

gavit21/Capture-The-Flag-CTF-

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Reinforcement Learning (MARL) for Capture The Flag (CTF)

Table of Contents

Introduction

Project Structure

Algorithms

Independent Q-Learning (IQL)

Key Considerations:

Deep Q-Network (DQN)

Multi-Agent Proximal Policy Optimization (MAPPO)

Installation

Usage

Training Models

Testing Models

Metrics Recording and Evaluation

Example of Saved Metrics:

Orignal Environment

Changed Environment

Train Results For IQL

Train Results For MAPPO

Test Results IQL

Test Results MAPPO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages