Skip to content

gthparch/macsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

796 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MacSim

Introduction

MacSim is a trace-based cycle-level GPGPU simulator developed by HPArch at Georgia Institute of Technology.

  • It simulates x86, ARM64, NVIDIA PTX and Intel GEN GPU instructions and can be configured as either a trace driven or execution-driven cycle level simulator. It models detailed micro-architectural behaviors, including pipeline stages, multi-threading, and memory systems.
  • MacSim is capable of simulating a variety of architectures, such as Intel's Sandy Bridge, Skylake (both CPUs and GPUs) and NVIDIA's Fermi. It can simulate homogeneous ISA multicore simulations as well as heterogeneous ISA multicore simulations. It also supports asymmetric multicore configurations (small cores + medium cores + big cores) and SMT or MT architectures as well.
  • Currently interconnection network model (based on IRIS) and power model (based on McPAT) are connected.
  • MacSim is also one of the components of SST, so multiple MacSim simulators can run concurrently.
  • The project has been supported by Intel, NSF, Sandia National Lab.

Table of Contents

Note

  • If you're interested in the Intel's integrated GPU model in MacSim, please refer to intel_gpu branch.

  • We've developed a power model for GPU architecture using McPAT. Please refer to the following paper for more detailed information. Power Modeling for GPU Architecture using McPAT Modeling for GPU Architecture using McPAT.pdf) by Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, Wonyong Sung, from Transactions on Design Automation of Electronic Systems (TODAES) Vol. 19, No. 3.

  • We've characterised the performance of Intel's integrated GPUs using MacSim. Please refer to the following paper for more detailed information. Performance Characterisation and Simulation of Intel's Integrated GPU Architecture (ISPASS'18)

Intel GEN GPU Architecture

  • Intel GEN9 GPU Architecture:

Documentation

Please see MacSim documentation file for more detailed descriptions.

Installation

Prerequisites

  • zlib (development library)

    # Ubuntu/Debian
    sudo apt install zlib1g-dev
    # RHEL/CentOS/Fedora
    sudo dnf install zlib-devel
  • Python >= 3.11 and SCons (build tool)

    uv venv
    uv pip install scons

    Optionally, activate the virtual environment so you can omit uv run:

    source .venv/bin/activate

Clone and Build

git clone https://github.com/gthparch/macsim.git --recursive
cd macsim
./build.py --ramulator -j 32

# Or without activating the virtual environment:
uv run ./build.py --ramulator -j 32

For more build options, see ./build.py --help.

Quick Start

This section walks you through downloading a trace, setting up the simulation, and running it.

1. Download a Sample Trace

uv pip install gdown
gdown -O macsim_traces.tar.gz 1rpAgIMGJnrnXwDSiaM3S7hBysFoVhyO1
tar -xzf macsim_traces.tar.gz
rm macsim_traces.tar.gz

This will extract sample traces from the Rodinia benchmark suite into a macsim_traces/ directory.

2. Set Up a Run Directory

You need three files in the same directory to run a simulation:

  • macsim — the binary executable
  • params.in — GPU configuration
  • trace_file_list — list of paths to GPU traces

Copy them from the build output:

mkdir run
cp bin/macsim bin/params.in bin/trace_file_list run/
cd run

3. Set Up the Trace Path

Edit trace_file_list. The first line is the number of traces, and the second line is the path to the trace:

1
/absolute/path/to/macsim_traces/hotspot/r512h2i2/kernel_config.txt

4. Run

./macsim

Simulation results will appear in the current directory. For example, check general.stat.out for the total cycle count:

grep CYC_COUNT_TOT general.stat.out

Note: The parameter file must be named params.in. The macsim binary looks for this exact filename in the current directory.

5. Run All Benchmarks

To run all downloaded traces and verify the build:

mkdir -p test_run && cp bin/macsim bin/params.in test_run/
cd test_run
for trace in ../macsim_traces/*/; do
  name=$(basename $trace)
  subdir=$(ls -d $trace/*/kernel_config.txt 2>/dev/null || ls $trace/kernel_config.txt 2>/dev/null)
  [ -z "$subdir" ] && continue
  printf "1\n$(realpath $subdir)\n" > trace_file_list
  result=$(timeout 120 ./macsim 2>&1 | grep "finalize" | head -1)
  echo "$name: $result"
done

Downloading Traces

Publicly Available Traces

Dataset Download
Rodinia Download
PyTorch Download
YOLOPv2 Download
GPT2 Download
GEMMA Download

Generating Your Own Traces

Warning: The trace generation tool is experimental — use at your own risk.

To generate traces for your own CUDA workloads, use the MacSim Tracer.

Simply prepend CUDA_INJECTION64_PATH to your original command. For example:

CUDA_INJECTION64_PATH=/path/to/main.so python3 your_cuda_program.py

Available environment variables:

Variable Description Default
TRACE_PATH Path to save trace files ./
KERNEL_BEGIN First kernel to trace 0
KERNEL_END Last kernel to trace UINT32_MAX
INSTR_BEGIN First instruction to trace per kernel 0
INSTR_END Last instruction to trace per kernel UINT32_MAX
COMPRESSOR_PATH Path to the compressor binary (built with tracer)
DEBUG_TRACE Generate human-readable debug traces 0
OVERWRITE Overwrite existing traces 0
TOOL_VERBOSE Enable verbose output 0

See the MacSim Tracer README for full installation and usage instructions.

Known Bugs

  1. src/memory.cc:1043: ASSERT FAILED — Happens with FasterTransformer traces + too many cores (40+). Solution: Reduce the number of cores.

  2. src/factory_class.cc:77: ASSERT FAILED — Happens when params.in file is missing or has a wrong name. Solution: Use params.in as the config file name.

  3. src/process_manager.cc:826: ASSERT FAILED ... error opening trace file — Too many trace files open simultaneously. Solution: Add ulimit -n 16384 to your ~/.bashrc.

People

Q & A

If you have a question, please use github issue ticket.

Tutorial

  • We had a tutorial in HPCA-2012. Please visit here for the slides.
  • We had a tutorial in ISCA-2012, Please visit here for the slides.

SST+MacSim

  • Here are two example configurations of SST+MacSim.
    • A multi-socket system with cache coherence model:
    • A CPU+GPU heterogeneous system with shared memory:

About

A heterogeneous architecture timing model simulator.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors