A distributed job scheduling system that handles task submissions from clients, assigns them to available worker nodes, and monitors worker health. The system support round-robin scheduling, task prioritization, and fault-tolerant task reassignment if a worker becomes unavailable. Heartbeat messages are used to detect failures, and the scheduler should reassign tasks from failed nodes to active ones. The system also track task statuses and enable performance visualization.
Firstly, you need to install libraries for Python:
pip install -r requirements.txtTo run the system you need to:
- Run RabbitMQ server
- Run Scheduler server
- Run Logger server
- Run Workers
- Generate tasks
To do this you can use the following commands:
Copy environment variables:
cp .env.example .envRun RabbitMQ cluster with Grafana:
cd rabbitmq
docker-compose up -dOr you can run only RabbitMQ by hands.
Run scheduler system:
./scripts/system_run.shAfter running this script, the system will generate and handle jobs automatically. You can monitor the system status by checking the logs of the logger service and checking the RabbitMQ server using the RabbitMQ management UI (http://localhost:15672) or Grafana (http://localhost:3000).