Welcome the Center for High Performance Computing (CHPC)'s Student Cluster Competition (SCC) - Team Selection Round. This round requires each team to build a prototype multi-node compute cluster within the National Integrated Cyber Infrastructure Systems (NICIS) virtual compute cloud (described below).
The goal of this document is to introduce you to the competition platform and familiarise you with some Linux and systems administration concepts. This competition provides you with a fixed set of virtual resources, that you will use to initialize a set a set of virtual machines instances based on your choice or flavor of Linux.
The CHPC invites applications from suitably qualified candidates to enter the CHPC Student Cluster Competition. The CHPC Student Cluster Competition gives undergraduate students at South African universities exposure to the High Performance Computing (HPC) Industry. The winning team will be entered into the ISC Student Cluster Competition hosted at the 2026 International Supercomputing Conference held in Hamburg, Germany.
You will be accessing all of the course work and material through this GitHub repository, which you and your team must check regularly to receive updates.
You are strongly encouraged to get help and even assist others by Opening and Participating in Discussions.
Tip
Active participation in the student discussions is an easy way to separate yourselves from the rest of the competition and make it easy for the instructors to notice you!
Everyday will comprise of four lectures in the mornings and tutorials taking place in the afternoons. A PDF Version of the Timetable is available for you to download.
Teams will be evaluate according to the following breakdown, with your progress in the tutorials and your final presentations carrying the most weight.
Component | Weight |
---|---|
Technical Knowledge Assessment | 0.2 |
Tutorials | 0.4 |
Cluster Design Presentation | 0.4 |
The role of mentors, instructors and volunteers is to provide leadership and guidance for the student competitors participating in this year's Center for High Performance Computing 2025 Student Cluster Competition.
In preparing your teams for the competition, your main goal is to ensure that you teach and impart knowledge to the student participants in such a way that they are empowered and enable to tackle the problems and benchmarking tasks themselves.
Under no circumstances whatsoever may mentors touch any competition hardware belonging to either their team, or the competition hardware of another team. Mentors are encouraged to provide guidance and leadership to their (as well as other) teams.
Any mentors found to be directly in contravention of this rule, may result in their team incurring a penalty. Repeated infringements may result in possible disqualification of their team.
We monitor all network traffic!
Below is a table with a number of Linux system commands and utilities that you may find useful in assisting you to debug problems that you may encounter with your clusters. Note that some of these utilities do not ship with the base deployment of a number of Linux flavors, and you may be required to install the associated packages, prior to making use of them.
Command | Description |
---|---|
ssh | Used from logging into the remote machine and for executing commands on the remote machine. |
scp | SCP copies files between hosts on a network. It uses ssh for data transfer, and uses the same authentication and provides the same security as ssh. |
wget / curl | Utility for non-interactive download of files from the Web.It supports HTTP, HTTPS, and FTP protocols. |
top / htop / btop | Provides a dynamic real-time view of a running system. It can display system summary information as well as a list of processes or threads. |
screen / tmux | Full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells). |
ip a | Display IP Addresses and property information |
dmesg | Prints the message buffer of the kernel. The output of this command typically contains the messages produced by the device drivers |
watch | Execute a program periodically, showing output fullscreen. |
df -h | Report file system disk space usage. |
ping | PING command is used to verify that a device can communicate within another on a network. |
lynx | Command-line based web browser (more useful than you think) |
ctrl+alt+[F1...F6] | Open another shell session (multiple ‘desktops’) |
ctrl+z | Move command to background (useful with ‘bg’) |
du -h | Summarize disk usage of each FILE, recursively for directories. |
lscpu | Command line utility that provides system CPU related information. |
lstotp | View the topology of a Linux system. |
inxi | Lists information related to your systems' sensors, partitions, drives, networking, audio, graphics, CPU, system, etc... |
hwinfo | Hardware probing utility that provides detailed info about various components. |
lshw | Hardware probing utility that provides detailed info about various components. |
proc | Information and control center of the kernel, providing a communications channel between kernel space and user space. Many of the preceding commands query information provided by proc, i.e. cat /proc/cpuinfo . |
uname | Useful for determining information about your current flavor and distribution of your operating system and its version. |
lsblk | Provides information about block devices (disks, hard drives, flash drives, etc) connected to your system and their partitioning schemes. |
You will need to submit the following for scoring and evaluation by the judges:
- Cluster Design Assignment [40 %]
- One PDF Presentation Slide with Team Profiles
This slide must clearly indicate your Team Name and Institution. Below each team member's photograph, indicate their
- Name and surname,
- Degree and Year of study,
- Presentation Slides
- Short Technical Brief with Cluster Design Specifications
- One PDF Presentation Slide with Team Profiles
This slide must clearly indicate your Team Name and Institution. Below each team member's photograph, indicate their
- Technical Knowledge Assessment [20 %]
- Tutorials [40 %]
You are tasked with designing a small cluster, with at least three nodes, to the value of R 500 000.00 (ZAR) and present your design to the judging panel. In your design you must specify hardware and software for an operational cluster and describe how it functions. The design must be based on servers and interconnects from either HPE, and accessories from either NVIDIA, or AMD or Intel. You MUST use the prices you find in the Parts List Spreadsheet.
The primary purpose of your HPC cluster is to run the following applications and benchmarks as efficiently as possible:
In addition, your choice of design must take into consideration:
- Base Platform (Server),
- Target Processing Unit (CPU / GPU),
- Memory, Networking and Storage Requirements,
- System and Application Dependency Software Requirements,
- Ease of Use (Build, Assembly, Deployment),
- Efficiency, Performance, Power Consumption and Reliability and
- Team Management, Coordination and Planning.
Important
You may submit an additional design, that extends upon your small R 500 000.00 cluster, up to the value of R 5 000 000.00. You may use any of the above links for this exercise, using a Dollar to Rand conversion rate or 1:20. You may use GPU's from either AMD or NVIDIA. You may utilize CPUs from either AMD or Intel. You must use HPE as a base platform for your severs.
In this revised design, consider additional nodes, additional / performance CPU's, additional RAM, GPU's, InfiniBand interconnects and any other aspects that you think would improve the performance of your initial cluster design.
This additional design should be no more than one slide. Price breakdown and additional component(s) motivation.
You will be presenting your findings in a short technical brief and specification. Detailing the specific components you've incorporated into your cluster, a spreadsheet with a clear breakdown of price, quantity and name / code of components would be useful. You must also present a 10 minute slideshow of your findings and cluster design.
The 10 minute slide presentation by the whole team must include your design decisions and the features of your cluster, including: cost, hardware, software, configuration and operation. Each member of the team is required to present even though you will be assessed as a team.
After the presentation the judging panel will have an opportunity to ask questions to each member of your team. All members of your team can be questioned about any part of the cluster, so make sure you are fully familiar with the design.
Caution
The deadline for submission of the Cluster Design Assignment is 23:00 on Friday the 11th July. Late submissions will be penalized.
Each Team must work together to answer and complete the Technical Knowledge Assessment to the best of their ability. Team Captains must email your findings to the organizers no later than 23:00 12th July. You are required to demonstrate your understanding of the concepts in YOUR OWN WORDS. Keep your answers succinct and to the point. Your answers to each of the questions, should not exceed more than 2-3 lines.
Caution
The deadline for submission of the Technical Knowledge Assessment is 23:00 on Saturday the 12th July. Late submissions will be penalized.
You will be evaluated on your overall progress in the tutorials. Below you will find an overview, glossary and high level breakdown of the tutorials. You must progress through four tutorials, which will be released daily. Your overall progress through the tutorials forms a large component of you score. By the end of the week you would have covered a considerable amount of content, use the links provided should you need to refer to a specific section and are having trouble remembering where is it.
Warning
Please note that the tutorial content matter is subject to change at any time, and you must regularly check the main
branch of this Github repository for updates.
Tutorial 1 deals with introducing concepts to users and getting them started with using the virtual lab, standing up the first virtual machine instance and connecting to it remotely. The content is as follows:
- Checklist
- Network Primer
- Launching your First Open Stack Virtual Machine Instance
- Accessing the NICIS Cloud
- Verify your Teams' Project Workspace and Available Resources
- Generating SSH Keys
- Create a New Private Virtual Network
- Create a New Router
- Create a New Security Group
- Launch a New Instance
- Linux Flavors and Distributions
- OpenStack Instance Flavors
- Networks, Ports, Services and Security Groups
- Key Pair
- Verify that your Instance was Successfully Deployed and Launched
- Associating an Externally Accessible IP Address
- Success State, Resource Management and Troubleshooting
- Introduction to Basic Linux Administration
- Linux Binaries, Libraries and Package Management
- Install, Compile and Run High Performance LinPACK (HPL) Benchmark
Tutorial 2 will demonstrate how to configure and stand-up a compute node, and access it using a transparently created, port forwarding SSH tunnel between your workstation and your head node. You will then install a number of critical services across your cluster.
- Checklist
- Spinning Up a Compute Node on Sebowa(OpenStack)
- Accessing Your Compute Node Using
ProxyJump
Directive - Understanding the Roles of the Head Node and Compute Node
- Manipulating Files and Directories
- Verifying Networking Setup
- Configuring a Simple Stateful Firewall Using nftables
- Network Time Protocol
- Network File System
- Generating an SSH Key for your NFS
/home
- User Account Management
- Ansible User Declaration
- WirGuard VPN Cluster Access
- ZeroTier
Tutorial 3 will demonstrate how to configure, build, compile and install a number of various system software and applications. You will also be building these applications with different tools. Finally, you will learn how to run applications across your cluster.
- Checklist
- Managing Your Environment
- Install Lmod
- Running the High Performance LINPACK (HPL) Benchmark on Your Compute Node
- Building and Compiling OpenBLAS and OpenMPI Libraries from Source
- Intel oneAPI Toolkits and Compiler Suite
- LinPACK Theoretical Peak Performance
- Spinning Up a Second Compute Node Using a Snapshot
- Application Benchmark Profiling
- HPC Challenge
- High Performance Conjugate Gradients
- Application Benchmarks and System Evaluation
Tutorial 4 demonstrates how to configure docker containers to deploy a monitoring stack, comprising of a metrics database service, an exporting / scraping service and a metric visualization services. You will then learn the very basics of how to visualize and interpret data. You will then learn how to automate the deployment of your Sebowa OpenStack infrastructure. Lastly, you'll deploy a scheduler and submit a job to it.
- Checklist
- Cluster Monitoring
- Configuring and Connecting to your Remote JupyterLab Server
- Automating the Deployment of your OpenStack Instances Using Terraform
- Continuous Integration Using CircleCI
- Slurm Scheduler and Workload Manager
The lecture slides are available for download - follow the link and download the raw files.
-
PDF Format
-
PPTX Format
-
PDF Format
-
PPTX Format
Important
While we value your feedback, the following sections are primarily targeted as Contributors to the Project. As a student participating in the competition, do NOT spend your time working through any of the material below. However, we would love to have your contributions to the project, after the competition.
You are strongly encouraged to contribute and improve the project by Opening and Participating in Discussions, Raising, Addressing and Resolving Issues. The following guide describes How to clone, push, and pull with git (beginners GitHub tutorial).
In order to effectively manage the various workflows and stages of development, testing and deployment, the project is comprised of three primary branches:
main
: Stable and production-ready deployment branch of the project.stag
: Staging branch which mirrors production and is used for integration testing of new features.dev
: Development branch for incorporating new features and bug fixes.
Editing the content directly, will require the use of Git. Using a terminal application or Git for Windows PowerShell or Git for MobaXTerm.
-
Generate an SSH Key (or use an existing one).
-
Add your SSH key to your Git profile.
-
git clone
a local copy of the repository, to your personal work space.You can copy the command from GitHub itself.
git clone git@github.com:chpc-tech-eval/scc.git
-
When starting work on a new feature or bug fix, create a feature branch off of the development branch and regularly get updates from
dev
to ensure that you remain consistent with any changes todev
:git checkout dev git pull origin dev
-
Create a new branch to work on. i.e.
git branch tutX/bugfix-or-new-feature
followed bygit checkout tutX/bugfix-or-new-feature
, or simply use a single commandgit checkout -b tutX/bugfix-or-new-feature
.- Give the branch a sensible name.
- You are encouraged to push the branch back to the remote so that collaborators can see what you are working on as you make the changes.
-
Make the appropriate changes and commit them locally:
git add <relative_path_to_changed_file(s)> git commit -m "some_message_pertaining_to_changes_made"
-
When you have completed editing your feature, merge any remote changes from
dev
and thenpush
your local changes, back upstream to the remote repository:git pull origin dev # (optional) it is generally a good practice to incorporate any changes in dev into your code early and often git pull origin feature/bugfix-or-new-feature # (optional) if you are collaborating on a specific feature with someone, it is important to incorporate their changes early and often git push origin feature/bugfix-or-new-feature
-
Once you are satisfied with the changes you've have been editing, eliminate all merge conflicts by pulling all remote changes and deviations into your local working copy.
git pull
.- If you are confident that your feature does not or has not deviated from the remote
dev
branch, usegit pull
to automaticallyfetch
andmerge
remote changes fromdev
into your feature branch. - Alternatively, if your branch is old, or depends on / requires changes from remote use
git fetch
, tofetch
remote changes and be able to preview them before merging. - Eliminate your local conflicts and merge all remote changes
git merge
. - Once all the conflicts have been resolved, and you've successfully merged all remote changes, push your branch upstream.
- If you are confident that your feature does not or has not deviated from the remote
-
Create a pull request to the remote
dev
branch on GitHub, to incorporate your feature.- Or another branch, if your feature branch was adding functionality to an existing feature branch.
Use the following guide on Github Markdown Syntax Editing.