An implementation of Federated Learning research baseline methods based on FedML-core. This is not an implementation for only stand-alone simulation, but a distributed system that can be deployed on multiple real devices (or several docker containers on a same server), which can help researchers to explore more problems that may exist on real FL systems.
Here are the list of publications based on this repository:
- #ICPP 2022# Our work "Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training" has been accepted by International Conference on Parallel Processing(ICPP), 2022.
Here is demo to deploy our framework on a cluster with three devices. A
(ip: 172.17.0.4) represents the server worker, B
(ip: 172.17.0.13) and C
(ip: 172.17.0.12) represent two client workers in FL system. All devices must have a python3
environment with pytorch
and grpc
installed. You can find the env-requirement under requirement.txt
file.
Startup scripts for all methods are under experiment/
directory. We first need to modify the grpc_ipconfig.csv
file as below:
receiver_id,ip
0,172.17.0.4
1,172.17.0.13
2,172.17.0.12
Receiver_id represents the worker_id for each worker in FL system. Usually, worker_id of server worker is 0
and client workers' id will start from 1
.
After that, we will start the worker process on each devices. All client workers need to be started up before server worker. Remember to execute all commands below at the root direcroy of this project:
python experiment/fedprox/run_fedprox_distributed.sh $worker_id
Then the training process will begin and the checkpoint of global model will be saved under checkpoint/fedprox/
. You can change the experiment settings by editing the pre-set arguments in run_fedprox_distributed.sh
.
Method | Reference |
---|---|
FedAvg | McMahan et al., 2017 |
FedProx | Li et al., 2020 |
FedAsync | Wang et al., 2021 |
FedAT | Zheng Chai et al., 2021 |
On going | ... |