This research aims to measure and compare the energy usage of fetching content generated by LLMs from a remote server via HTTP requests and generating content with on-device LLMs, in diverse scenarios with different LLMs and varying generated content lengths.
Our work is of interest to researchers exploring the trade-off of deploying on-device LLMs versus fetching similar generated content from remote server, from the perspective of energy usage of the user's device. The result of our experiment can help software engineers better understand the potential energy impact of integrating LLMs on devices to inform software architecture and design choices of future web and mobile applications.
- This experiment is created and run on a Macbook Pro M2 (Apple Silicon architecture).
- The server used in this experiment includes an Nvidia RTX 4070 with 12GB of VRAM. Further settings information can be found in the document under Experiment Execution section.
- Before you begin, make sure you have Python 3 installed on your system. This project requires Python 3 to run. Link to Install Python
- Install the project requirement in the root directory using the following:
pip install -r requirement.txt
- Create a new
.env
file in the root folder, and add your server's IP address to it like this:
export SERVER_IP= "<Your Server IP here>"
- Make sure to install Ollama and its corresponding LLMs on both on-device Device and Server. In this experiment we used the following models: llama3.1:8b, gemma:2b, gemma:7b, phi3:3.8b, qwen2:1.5b, qwen2:7b, mistral:7b.
- Make sure your server allows HTTP connection to port
11434
, which is the original port of Ollama.
- For running the experiment, run the following command from the root directory:
python experiment-runner/ experiment/RunnerConfig.py
The output data is saved in the run_table.csv
file, which could be found in /experiment/experiment_output
folder.
For performing statistical tests on the data generated from the experiment, run the .ipynb
file (R runtime) in folder Data Analysis with the run_table.csv
file.