NVIDIA-SMI Reporter

This project reports statistics from the nvidia-smi command to a data consumer, which in turn alerts a bot.

Collected statistics are:

The temperature of the GPU
GPU Usage percent
GPU Memory usage
The processes that are using the GPU, and the memory they use - Not supported on Windows due to an issue with nvidia-smi

How does it work?

This is made of three main components:

The client, where the GPU resides
The server, where the Telegram Bot and data collector reside
The database (influxDB), where the metrics are stored

The Client

The client polls nvidia-smi, parses the output to get the required metrics and sends them to the collector. This runs as a rudimentary CRON job running every 30 minutes, but will be updated to reflect an actual CRON job.

The Server

The server contains the collector, queries the database every 30 minutes and retrieves data. Anything that crosses a certain threshold is considered anomalous and alerts the bot.

The Database

Please use an influxDB instance. I am yet to find another time-series database that'll run on a Raspberry Pi.

How do I get started?

Clone the repository.
Move the server and client code where needed.
Setup your database, and run setup.py in both the server and the client.
Create your Telegram bot and allow inline commands. See here for more information
Setup an environment variable NVIDIA_BOT_TOKEN with the API key of your bot as its value.
Start the server at listener.py to listen for any metrics sent.
Once your bot is up and running, send a command /hello so that it registers your Telegram account.
Open config.json in your client and copy the deviceid variable. You'll need to register it to the bot in order to get updates.
Send /link <<your deviceid>> to the bot so that it links your device ID to the chat.
Run your client!

How will you / can we improve this project?

Time-Series Learning instead of a hardcoded value
Inline documentation
"Smarter" bot that analyzes trends and alerts better
Better README?
Better code structure
Multi-GPU and multi-client support
Better CRON
Authentication for writing data points
And many more...

Where have you tested it?

This project has been tested with the client running Linux Mint 19.2 and an nVidia GT-840M and the server being a Raspberry Pi 3 Model B+ running Raspbian Stretch.
It also works with the server and bot running on Linux Mint 19.2 and client being a Windows 10 Machine with nVidia GTX-1660ti

Oops! Something broke!

Raise an issue! If you can fix it, make a PR! I'll gladly accept improvements!

Ooh ooh I have an idea I want to implement in this project

Build your feature and raise a PR by all means!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
client		client
server		server
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA-SMI Reporter

How does it work?

The Client

The Server

The Database

How do I get started?

How will you / can we improve this project?

Where have you tested it?

Oops! Something broke!

Ooh ooh I have an idea I want to implement in this project

About

Releases

Packages

Languages

License

Skeletrox/smi_reporter

Folders and files

Latest commit

History

Repository files navigation

NVIDIA-SMI Reporter

How does it work?

The Client

The Server

The Database

How do I get started?

How will you / can we improve this project?

Where have you tested it?

Oops! Something broke!

Ooh ooh I have an idea I want to implement in this project

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages