Skip to content

Commit

Permalink
Tests: Updated speed tests README file.
Browse files Browse the repository at this point in the history
  • Loading branch information
levy committed Nov 29, 2024
1 parent d73e3bc commit 2540142
Showing 1 changed file with 110 additions and 26 deletions.
136 changes: 110 additions & 26 deletions tests/speed/README.txt
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
SPEED TESTS FOR INET
====================

This folder contains speed tests for the INET framework using the provided
examples.
This folder contains speed tests for the INET Framework using the examples.

In general, it must be noted that it's notoriously difficult to do repeatable
speed measurements. This file describes several common pitfalls that must be
speed measurements. This file describes several common pitfalls that should be
avoided. Before optimizing, it's essential to have a speed test that produces
repeatable results, otherwise it's unclear if any particular optimization is
repeatable results. Otherwise it's unclear if any particular optimization is
right or wrong.

The intended audience of this document are the users and the maintainers of
the INET framework and its speed tests.
The intended audience of this document are the maintainers of the INET Framework.

1. Fix the Environment

Expand All @@ -37,11 +35,11 @@ many interesting things done by the operating system. Either way, if the goal is
measuring how long the simulation takes, then the best thing to do is to put in
a few lines of code around that part such as this:

clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
int64_t time = ((int64_t)end.tv_sec * 1000000000L + end.tv_nsec) - ((int64_t)start.tv_sec * 1000000000L + start.tv_nsec);
out << "Runtime: " << time / 1.0E+9 << "\n";
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
int64_t time = ((int64_t)end.tv_sec * 1000000000L + end.tv_nsec) - ((int64_t)start.tv_sec * 1000000000L + start.tv_nsec);
out << "Runtime: " << time / 1.0E+9 << "\n";

4. Calculate Mean

Expand Down Expand Up @@ -87,43 +85,129 @@ measurements less repeatable.

Making sure the cpu's frequency doesn't scale during the speedtest:

sudo pstate-frequency -S --turbo off # turn off frequency scaling
sudo pstate-frequency -S --min 100 # set frequency of cpu cores to 100% (base frequency)
sudo pstate-frequency -S --max 100
$ sudo pstate-frequency -S --turbo off # turn off frequency scaling
$ sudo pstate-frequency -S --min 100 # set frequency of cpu cores to 100% (base frequency)
$ sudo pstate-frequency -S --max 100

pstate-frequency -G # to check if turbo is off and the percentages are properly set to around 100% (the cpu frequencies reported here can be misleading)
cpufreq-info # check the actual cpu core frequencies, should be around the base frequency
$ pstate-frequency -G # to check if turbo is off and the percentages are properly set to around 100% (the cpu frequencies reported here can be misleading)
$ cpufreq-info # check the actual cpu core frequencies, should be around the base frequency

10. Configure OMNeT++ Features
10. Disable Thermal Throttling

Ensure thermal throttling does not alter CPU performance. Monitor CPU temperatures
and use sufficient cooling.

11. Minimize Interrupts

Minimize hardware and software interrupts during measurements to reduce unexpected
performance impacts.

12. Change CPU Affinity

Set CPU affinity to bind the process to a specific CPU. This ensures the process runs only on designated CPUs. Use the taskset command:

$ taskset -c 0 ./inet_profile -u Cmdenv

13. Isolate CPU(s)

Isolate specific CPUs using kernel parameters to dedicate them exclusively to
the measurement process. Modify the GRUB configuration:

GRUB_CMDLINE_LINUX="isolcpus=14,15 nohz_full=14,15 rcu_nocbs=14,15"

14. Disable Power Management

Disable all CPU power management features to ensure consistent performance.

15. Use Real-Time Kernel

Consider using a real-time kernel for higher priority and more deterministic task scheduling.

16. Be aware of clock granularity

Understand the resolution of the timing functions used, as insufficient granularity
can lead to inaccurate measurements.

17. User time vs system time

Differentiate between user time (time spent in user-mode processes) and system
time (time spent in kernel-mode processes). This distinction helps interpret
performance results.

18. Wall-clock time vs CPU time

Wall-clock time includes all elapsed time, whereas CPU time only measures the
time actively used by the process.

19. Know What is Measured

Measure CPU time, CPU cycles or CPU instructions to analyze performance at the
hardware level. Example APIs include:

_rdtsc
clock()
std::chrono
clock_gettime(CLOCK_PROCESS_CPUTIME_ID
struct rusage r;
getrusage(RUSAGE_SELF, &r);

20. CPU Cache

Be mindful of cache effects on performance. Warm-up runs can help mitigate
cache-related variability. A context switch, to handle an interrupt or to run
another task, clears the CPU caches, so most likely increases CPU time.

21. Useful Command-Line Tools

$ time
$ cpufreq-info
$ lscpu
$ perf # this can display context switches, CPU cycles, CPU instructions, CPU migrations
$ taskset
$ mpstat

22. Important mapped files

Use the following for relevant configuration and status information:

$ cat /proc/cmdline
$ cat /proc/interrupts
$ cat /proc/self/status | grep Cpus_allowed_list
$ cat /proc/sys/kernel/perf_event_paranoid
$ cat /sys/devices/system/cpu/smt/control
$ cat /sys/devices/system/cpu/intel_pstate/no_turbo
$ cat /sys/devices/system/cpu/cpu14/cpufreq/scaling_max_freq
$ cat /sys/devices/system/cpu/cpu14/cpufreq/scaling_cur_freq
$ cat /sys/devices/system/cpu/cpu15/cpufreq/scaling_governor
$ cat /proc/irq/*/smp_affinity

23. Configure OMNeT++ Features

Disable Qtenv, Tkenv, Osg, OsgEarth, Parsim and other unnecessary optional OMNeT++
features. Some of these features link with several additional libraries that
may affect performance. For example, linking with pthread library might decrease
the performance of std::shared_ptr by using locking to provide atomic operations
for the reference counter.

11. Configure INET Features
24. Configure INET Features

Disable VOIP, visualization, and other unnecessary optional INET features.
Some of these features link with several additional libraries that may affect
performance.

12. Disable Self Checking
25. Disable Self Checking

Disable all self checking code that would unnecessarily affect the measurement
results. Even if these would be included in the final product, they don't change
the output of the program during the measurement. For example, the INET packet
API contains two sets of self checking operations, see Chunk.h for more details.

13. Running Tests
26. Running Tests

In the simplest case you can run all speed tests with the following command:
$./speedtest
In the simplest case, you can run all speed tests with the following command:

For a more complicated example, you could run some tests from examples.csv
using the compiled debug version 10 times with the following command:
$./speedtest examples.csv -m TwoHosts -r 10 -e /home/levy/workspace/omnetpp/bin/opp_run_dbg
$ inet_run_speed_tests

For more details on command line options, run:
$./speedtest -h

$ inet_run_speed_tests -h

0 comments on commit 2540142

Please sign in to comment.