This project attempted to explore the communication efficiency, scalability of the existing federated machine learning techniques and possible improvements for further optimisation.
The aim of this project was to investigate, design and evaluate different methods to reduce overall data communication during federated learning, without sacrificing learning accuracy
- Research the necessary library and development environment to conduct various FL simulations
- Create suitable unbalanced dataset, to simulate real world FL system and evaluate corresponding methods
- Build FL model with existing machine learning framework using basic model aggregation such as averaged weights update (
$Fed\Avg$ ) - Investigate the effect of parameters: number of clients, rounds, epochs, learning rate, optimisation functions on the global model
- Benchmark the FL algorithm, use metrics such as learning accuracy and loss to evaluate model performance and convergence rate by deploying different communication reduction strategies
- Choose the best method out of all proposed reduction strategies for optimising communication, calculate the amount of reduction achieved
For setting up GCP, since the algorithm is not memory optimised, the memory usage was very intensive while running different reduction functions in tff_vary_num_clients_and_rounds.py
. This is likely due to the fact that TFF is not currently optimised for selecting a varying number of clients as it seems to mess up
with the state during the iterative process and taking up huge accumulative memory. As a result, the RAM in VM required on GCP for running tff_vary_num_clients_and_rounds.py was 128GB, at its peak it's using around 50% of the total memory so it's sth to keep in mind.
-
Install the Python development environment on your system
sudo apt update ; sudo apt upgrade
sudo apt install python3-dev python3-pip python3-venv
-
Check python3 and pip3 version
python3 --version
pip3 --version
-
Create a virtual environment (recommended)
python3 -m venv --system-site-packages ./venv
activate it
source ~/venv/bin/activate
-
Go inside created virtual environment
cd venv
upgrade pip
(venv) $ pip install --upgrade pip
list packages installed within the virtual environment
(venv) $ pip list
-
Install the TensorFlow pip package
(venv) $ pip install testresources
(venv) $ pip install tensorflow==2.4.1
-
Verify the install:
(venv) $ python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
-
Install Tensorflow Federated
(venv) $ pip install tensorflow-federated==0.18.0
-
Test Tensorflow Federated
(venv) $ python -c "import tensorflow_federated as tff; print(tff.federated_computation(lambda: 'Hello World')())"
-
Exit virtualenv until you're done using tensorflow/tensorflow-federated
(venv) $ deactivate
The python version in this project throughout was 3.8.8, pyenv was used to manage different python versions
All the dependacies, versions and necessary packages are exported & listed in requirements.txt(albeit not all of them are useful to run on local machines).
First sudo nano /etc/pam.d/common-session
Then add session required pam_limits.so
to /etc/pam.d/common-session
Then sudo nano /etc/security/limits.conf
add
* hard nofile 500000
* soft nofile 500000
Then set
ulimit -n 500000
to see the change
ulimit -a
python3 tff_vary_num_clients_and_rounds.py MODE
to run the script with different mode arguments.
The mode you can select are: MODE = [constant,exponential,linear,sigmoid,reciprocal]
python3 tff_UNIFORM_vs_NUM_EXAMPLES.py
and python3 tff_train_test_split.py
respectively to run these two scripts, no arguments/mode needed.
python3 plot.py mode
to run the script with different mode arguments.
The mode you can select are: mode = ['reduction_functions', 'femnist_distribution', 'uniform_vs_num_clients_weighting', 'accuracy_10percent_vs_50percent_clients_comparison', 'accuracy_5_34_338_comparison', 'reduction_functions_comparison','updates_comparison']